R: Split Selection Modeling for Low-Dimensional Data -...

cv.splitSelect {splitSelect}

R Documentation

Split Selection Modeling for Low-Dimensional Data - Cross-Validation

Description

cv.splitSelect performs the best split selection algorithm with cross-validation

Usage

cv.splitSelect(
  x,
  y,
  intercept = TRUE,
  G,
  use.all = TRUE,
  family = c("gaussian", "binomial")[1],
  group.model = c("glmnet", "LS", "Logistic")[1],
  alphas = 0,
  nsample = NULL,
  fix.partition = NULL,
  fix.split = NULL,
  nfolds = 10,
  parallel = FALSE,
  cores = getOption("mc.cores", 2L)
)

Arguments

`x`	Design matrix.
`y`	Response vector.
`intercept`	Boolean variable to determine if there is intercept (default is TRUE) or not.
`G`	Number of groups into which the variables are split. Can have more than one value.
`use.all`	Boolean variable to determine if all variables must be used (default is TRUE).
`family`	Description of the error distribution and link function to be used for the model. Must be one of "gaussian" or "binomial".
`group.model`	Model used for the groups. Must be one of "glmnet" or "LS".
`alphas`	Elastic net mixing parameter. Should be between 0 (default) and 1.
`nsample`	Number of sample splits for each value of G. If NULL, then all splits will be considered (unless there is overflow).
`fix.partition`	Optional list with G elements indicating the partitions (in each row) to be considered for the splits.
`fix.split`	Optional matrix with p columns indicating the groups (in each row) to be considered for the splits.
`nfolds`	Number of folds for the cross-validation procedure.
`parallel`	Boolean variable to determine if parallelization of the function. Default is FALSE.
`cores`	Number of cores for the parallelization for the function.

Value

An object of class cv.splitSelect.

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

Examples

# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)

# Generating the coefficients for a fixed partition of the variables

split.out <- cv.splitSelect(x.train, y.train, G=2, use.all=TRUE,
                            fix.partition=list(matrix(c(2,2), 
                                               ncol=2, byrow=TRUE)), 
                            fix.split=NULL,
                            intercept=TRUE, group.model="glmnet", alphas=0, nfolds=10)

[Package splitSelect version 1.0.3 Index]