splitSelect {splitSelect}R Documentation

Best Split Selection Modeling for Low-Dimensional Data

Description

splitSelect performs the best split selection algorithm.

Usage

splitSelect(
  x,
  y,
  intercept = TRUE,
  G,
  use.all = TRUE,
  family = c("gaussian", "binomial")[1],
  group.model = c("glmnet", "LS", "Logistic")[1],
  lambdas = NULL,
  alphas = 0,
  nsample = NULL,
  fix.partition = NULL,
  fix.split = NULL,
  parallel = FALSE,
  cores = getOption("mc.cores", 2L),
  verbose = TRUE
)

Arguments

x

Design matrix.

y

Response vector.

intercept

Boolean variable to determine if there is intercept (default is TRUE) or not.

G

Number of groups into which the variables are split. Can have more than one value.

use.all

Boolean variable to determine if all variables must be used (default is TRUE).

family

Description of the error distribution and link function to be used for the model. Must be one of "gaussian" or "binomial".

group.model

Model used for the groups. Must be one of "glmnet" or "LS".

lambdas

The shinkrage parameters for the "glmnet" regularization. If NULL (default), optimal values are chosen.

alphas

Elastic net mixing parameter. Should be between 0 (default) and 1.

nsample

Number of sample splits for each value of G. If NULL, then all splits will be considered (unless there is overflow).

fix.partition

Optional list with G elements indicating the partitions (in each row) to be considered for the splits.

fix.split

Optional matrix with p columns indicating the groups (in each row) to be considered for the splits.

parallel

Boolean variable to determine if parallelization of the function. Default is FALSE.

cores

Number of cores for the parallelization for the function.

verbose

Boolean variable to determine if console output for cross-validation progress is printed (default is TRUE).

Value

An object of class splitSelect.

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

See Also

coef.splitSelect, predict.splitSelect

Examples

# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)

# Generating the coefficients for a fixed partition of the variables

split.out <- splitSelect(x.train, y.train, G=2, use.all=TRUE,
                         fix.partition=list(matrix(c(2,2), 
                                             ncol=2, byrow=TRUE)), 
                         fix.split=NULL,
                         intercept=TRUE, group.model="glmnet", alphas=0)



[Package splitSelect version 1.0.3 Index]