| xval.oem {oem} | R Documentation | 
Fast cross validation for Orthogonalizing EM
Description
Fast cross validation for Orthogonalizing EM
Usage
xval.oem(
  x,
  y,
  nfolds = 10L,
  foldid = NULL,
  type.measure = c("mse", "deviance", "class", "auc", "mae"),
  ncores = -1,
  family = c("gaussian", "binomial"),
  penalty = c("elastic.net", "lasso", "ols", "mcp", "scad", "mcp.net", "scad.net",
    "grp.lasso", "grp.lasso.net", "grp.mcp", "grp.scad", "grp.mcp.net", "grp.scad.net",
    "sparse.grp.lasso"),
  weights = numeric(0),
  lambda = numeric(0),
  nlambda = 100L,
  lambda.min.ratio = NULL,
  alpha = 1,
  gamma = 3,
  tau = 0.5,
  groups = numeric(0),
  penalty.factor = NULL,
  group.weights = NULL,
  standardize = TRUE,
  intercept = TRUE,
  maxit = 500L,
  tol = 1e-07,
  irls.maxit = 100L,
  irls.tol = 0.001,
  compute.loss = FALSE
)
Arguments
| x | input matrix of dimension n x p (sparse matrices not yet implemented). 
Each row is an observation, each column corresponds to a covariate. The xval.oem() function
is optimized for n >> p settings and may be very slow when p > n, so please use other packages
such as  | 
| y | numeric response vector of length  | 
| nfolds | integer number of cross validation folds. 3 is the minimum number allowed. defaults to 10 | 
| foldid | an optional vector of values between 1 and  | 
| type.measure | measure to evaluate for cross-validation. The default is  | 
| ncores | Integer scalar that specifies the number of threads to be used | 
| family | 
 | 
| penalty | Specification of penalty type. Choices include: 
 Careful consideration is required for the group lasso, group MCP, and group SCAD penalties. Groups as specified by the  | 
| weights | observation weights. defaults to 1 for each observation (setting weight vector to length 0 will default all weights to 1) | 
| lambda | A user supplied lambda sequence. By default, the program computes
its own lambda sequence based on  | 
| nlambda | The number of lambda values - default is 100. | 
| lambda.min.ratio | Smallest value for lambda, as a fraction of  | 
| alpha | mixing value for  | 
| gamma | tuning parameter for SCAD and MCP penalties. must be >= 1 | 
| tau | mixing value for  | 
| groups | A vector of describing the grouping of the coefficients. See the example below. All unpenalized variables should be put in group 0 | 
| penalty.factor | Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. | 
| group.weights | penalty factors applied to each group for the group lasso. Similar to  | 
| standardize | Logical flag for  | 
| intercept | Should intercept(s) be fitted ( | 
| maxit | integer. Maximum number of OEM iterations | 
| tol | convergence tolerance for OEM iterations | 
| irls.maxit | integer. Maximum number of IRLS iterations | 
| irls.tol | convergence tolerance for IRLS iterations. Only used if  | 
| compute.loss | should the loss be computed for each estimated tuning parameter? Defaults to  | 
Value
An object with S3 class "xval.oem"
References
Huling. J.D. and Chien, P. (2022), Fast Penalized Regression and Cross Validation for Tall Data with the oem Package. Journal of Statistical Software 104(6), 1-24. doi:10.18637/jss.v104.i06
Examples
set.seed(123)
n.obs <- 1e4
n.vars <- 100
true.beta <- c(runif(15, -0.25, 0.25), rep(0, n.vars - 15))
x <- matrix(rnorm(n.obs * n.vars), n.obs, n.vars)
y <- rnorm(n.obs, sd = 3) + x %*% true.beta
system.time(fit <- oem(x = x, y = y, 
                       penalty = c("lasso", "grp.lasso"), 
                       groups = rep(1:20, each = 5)))
                       
system.time(xfit <- xval.oem(x = x, y = y, 
                             penalty = c("lasso", "grp.lasso"), 
                             groups = rep(1:20, each = 5)))
                             
system.time(xfit2 <- xval.oem(x = x, y = y, 
                              penalty = c("lasso", "grp.lasso",
                                          "mcp",       "scad", 
                                          "mcp.net",   "scad.net",
                                          "grp.lasso", "grp.lasso.net",
                                          "grp.mcp",   "grp.scad",
                                          "sparse.grp.lasso"), 
                              groups = rep(1:20, each = 5)))