R: Best Split Selection Modeling for Low-Dimensional Data

splitSelect {splitSelect}

R Documentation

Best Split Selection Modeling for Low-Dimensional Data

Description

splitSelect performs the best split selection algorithm.

Usage

splitSelect(
  x,
  y,
  intercept = TRUE,
  G,
  use.all = TRUE,
  family = c("gaussian", "binomial")[1],
  group.model = c("glmnet", "LS", "Logistic")[1],
  lambdas = NULL,
  alphas = 0,
  nsample = NULL,
  fix.partition = NULL,
  fix.split = NULL,
  parallel = FALSE,
  cores = getOption("mc.cores", 2L),
  verbose = TRUE
)

Arguments

`x`	Design matrix.
`y`	Response vector.
`intercept`	Boolean variable to determine if there is intercept (default is TRUE) or not.
`G`	Number of groups into which the variables are split. Can have more than one value.
`use.all`	Boolean variable to determine if all variables must be used (default is TRUE).
`family`	Description of the error distribution and link function to be used for the model. Must be one of "gaussian" or "binomial".
`group.model`	Model used for the groups. Must be one of "glmnet" or "LS".
`lambdas`	The shinkrage parameters for the "glmnet" regularization. If NULL (default), optimal values are chosen.
`alphas`	Elastic net mixing parameter. Should be between 0 (default) and 1.
`nsample`	Number of sample splits for each value of G. If NULL, then all splits will be considered (unless there is overflow).
`fix.partition`	Optional list with G elements indicating the partitions (in each row) to be considered for the splits.
`fix.split`	Optional matrix with p columns indicating the groups (in each row) to be considered for the splits.
`parallel`	Boolean variable to determine if parallelization of the function. Default is FALSE.
`cores`	Number of cores for the parallelization for the function.
`verbose`	Boolean variable to determine if console output for cross-validation progress is printed (default is TRUE).

Value

An object of class splitSelect.

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

Examples

# Setting the parameters
p <- 4
n <- 30
n.test <- 5000
beta <- rep(5,4)
rho <- 0.1
r <- 0.9
SNR <- 3
# Creating the target matrix with "kernel" set to rho
target_cor <- function(r, p){
  Gamma <- diag(p)
  for(i in 1:(p-1)){
    for(j in (i+1):p){
      Gamma[i,j] <- Gamma[j,i] <- r^(abs(i-j))
    }
  }
  return(Gamma)
}
# AR Correlation Structure
Sigma.r <- target_cor(r, p)
Sigma.rho <- target_cor(rho, p)
sigma.epsilon <- as.numeric(sqrt((t(beta) %*% Sigma.rho %*% beta)/SNR))
# Simulate some data
x.train <- mvnfast::rmvn(30, mu=rep(0,p), sigma=Sigma.r)
y.train <- 1 + x.train %*% beta + rnorm(n=n, mean=0, sd=sigma.epsilon)

# Generating the coefficients for a fixed partition of the variables

split.out <- splitSelect(x.train, y.train, G=2, use.all=TRUE,
                         fix.partition=list(matrix(c(2,2), 
                                             ncol=2, byrow=TRUE)), 
                         fix.split=NULL,
                         intercept=TRUE, group.model="glmnet", alphas=0)

[Package splitSelect version 1.0.3 Index]

Best Split Selection Modeling for Low-Dimensional Data

Description

Usage

Arguments

Value

Author(s)

See Also

Examples