cv.glmnetr {glmnetr}R Documentation

Get a cross validation informed relaxed lasso model fit.

Description

Derive a relaxed lasso model and identifies hyperparameters, i.e. lambda and gamma, which give the best bit using cross validation. It is analogous to the cv.glmnet() function of the 'glmnet' package, but handles cases where glmnet() may run slowly when using the relaxed=TRUE option.

Usage

cv.glmnetr(
  xs,
  start = NULL,
  y_,
  event = NULL,
  family = "gaussian",
  lambda = NULL,
  gamma = c(0, 0.25, 0.5, 0.75, 1),
  folds_n = 10,
  limit = 2,
  fine = 0,
  track = 0,
  seed = NULL,
  foldid = NULL,
  ties = "efron",
  stratified = 1,
  time = NULL,
  ...
)

Arguments

xs

predictor matrix

start

vector of start times or the Cox model. Should be NULL for other models.

y_

outcome vector

event

event vector in case of the Cox model. May be NULL for other models.

family

model family, "cox", "binomial" or "gaussian" (default)

lambda

the lambda vector. May be NULL.

gamma

the gamma vector. Default is c(0,0.25,0.50,0.75,1).

folds_n

number of folds for cross validation. Default and generally recommended is 10.

limit

limit the small values for lambda after the initial fit. This will eliminate calculations that have small or minimal impact on the cross validation. Default is 2 for moderate limitation, 1 for less limitation, 0 for none.

fine

use a finer step in determining lambda. Of little value unless one repeats the cross validation many times to more finely tune the hyperparameters. See the 'glmnet' package documentation.

track

indicate whether or not to update progress in the console. Default of 0 suppresses these updates. The option of 1 provides these updates. In fitting clinical data with non full rank design matrix we have found some R-packages to take a vary long time or seemingly be caught in infinite loops. Therefore we allow the user to track the program progress and judge whether things are moving forward or if the process should be stopped.

seed

a seed for set.seed() so one can reproduce the model fit. If NULL the program will generate a random seed. Whether specified or NULL, the seed is stored in the output object for future reference. Note, for the default this randomly generated seed depends on the seed in memory at that time so will depend on any calls of set.seed prior to the call of this function.

foldid

a vector of integers to associate each record to a fold. The integers should be between 1 and folds_n.

ties

method for handling ties in Cox model for relaxed model component. Default is "efron", optionally "breslow". For penalized fits "breslow" is always used as in the 'glmnet' package.

stratified

folds are to be constructed stratified on an indicator outcome 1 (default) for yes, 0 for no. Pertains to event variable for "cox" and y_ for "binomial" family.

time

track progress by printing to console elapsed and split times. Suggested to use track option instead as time options will be eliminated.

...

Additional arguments that can be passed to glmnet()

Details

This is the main program for model derivation. As currently implemented the package requires the data to be input as vectors and matrices with no missing values (NA). All data vectors and matrices must be numerical. For factors (categorical variables) one should first construct corresponding numerical variables to represent the factor levels. To take advantage of the lasso model, one can use one hot coding assigning an indicator for each level of each categorical variable, or creating as well other contrasts variables suggested by the subject matter.

Value

A cross validation informed relaxed lasso model fit.

Author(s)

Walter Kremers (kremers.walter@mayo.edu)

See Also

summary.cv.glmnetr , predict.cv.glmnetr , glmnetr , nested.glmnetr

Examples

# set seed for random numbers, optionally, to get reproducible results
set.seed(82545037)
sim.data=glmnetr.simdata(nrows=100, ncols=100, beta=NULL)
xs=sim.data$xs 
y_=sim.data$y_ 
event=sim.data$event
# for this example we use a small number for folds_n to shorten run time 
cv.glmnetr.fit = cv.glmnetr(xs, NULL, y_, NULL, family="gaussian", folds_n=3, limit=2) 
plot(cv.glmnetr.fit)
plot(cv.glmnetr.fit, coefs=1)
summary(cv.glmnetr.fit)


[Package glmnetr version 0.5-1 Index]