Cross-validation in penalized generalized linear models {penalized} | R Documentation |
Cross-validated penalized regression
Description
Cross-validating generalized linear models with L1 (lasso or fused lasso) and/or L2 (ridge) penalties, using likelihood cross-validation.
Usage
cvl (response, penalized, unpenalized, lambda1 = 0, lambda2= 0, positive = FALSE,
fusedl = FALSE, data, model = c("cox", "logistic", "linear", "poisson"),
startbeta, startgamma, fold, epsilon = 1e-10, maxiter, standardize = FALSE,
trace = TRUE, approximate = FALSE)
optL1 (response, penalized, unpenalized, minlambda1, maxlambda1, base1, lambda2 = 0,
fusedl = FALSE, positive = FALSE, data,
model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold,
epsilon = 1e-10, maxiter = Inf, standardize = FALSE, tol = .Machine$double.eps^0.25,
trace = TRUE)
optL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2, maxlambda2, base2,
fusedl = FALSE ,positive = FALSE, data,
model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma,
fold, epsilon = 1e-10, maxiter, standardize = FALSE, tol = .Machine$double.eps^0.25,
trace = TRUE, approximate = FALSE)
profL1 (response, penalized, unpenalized, minlambda1, maxlambda1, base1, lambda2 = 0,
fusedl = FALSE,positive = FALSE, data,
model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold,
epsilon = 1e-10, maxiter = Inf, standardize = FALSE, steps = 100, minsteps = steps/3,
log = FALSE, save.predictions = FALSE, trace = TRUE, plot = FALSE)
profL2 (response, penalized, unpenalized, lambda1 = 0, minlambda2, maxlambda2, base2,
fusedl = FALSE,positive = FALSE, data,
model = c("cox", "logistic", "linear", "poisson"), startbeta, startgamma, fold,
epsilon = 1e-10, maxiter, standardize = FALSE, steps = 100, minsteps = steps/2,
log = TRUE, save.predictions = FALSE, trace = TRUE, plot = FALSE, approximate = FALSE)
Arguments
response |
The response variable (vector). This should be a numeric vector for linear regression, a |
penalized |
The penalized covariates. These may be specified either as a matrix or as a (one-sided) |
unpenalized |
Additional unpenalized covariates. Specified as under |
lambda1 , lambda2 |
The fixed values of the tuning parameters for L1 and L2 penalization. Each must be either a single positive numbers or a vector with length equal to the number of covariates in |
minlambda1 , minlambda2 , maxlambda1 , maxlambda2 |
The values of the tuning parameters for L1 or L2 penalization between which the cross-validated likelihood is to be profiled or optimized. For fused lasso penalty |
base1 , base2 |
An optional vector of length equal to the number of covariates in penalized. If supplied, profiling or optimization is performed between |
fusedl |
If |
positive |
If |
data |
A |
model |
The model to be used. If missing, the model will be guessed from the |
startbeta |
Starting values for the regression coefficients of the penalized covariates. These starting values will be used only for the first values of |
startgamma |
Starting values for the regression coefficients of the unpenalized covariates. These starting values will be used only for the first values of |
fold |
The fold for cross-validation. May be supplied as a single number (between 2 and n) giving the number of folds, or, alternatively, as a length |
epsilon |
The convergence criterion. As in |
maxiter |
The maximum number of iterations allowed in each fitting of the model. Set by default at 25 when only an L2 penalty is present, infinite otherwise. |
standardize |
If |
steps |
The maximum number of steps between |
minsteps |
The minimum number of steps between |
log |
If |
tol |
The tolerance of the Brent algorithm used for minimization. See also |
save.predictions |
Controls whether or not to save cross-validated predictions for all values of lambda. |
trace |
If |
approximate |
If |
plot |
If |
Details
All five functions return a list with the following named elements:
lambda
:For
optL1
andoptL2
lambda
gives the optimal value of the tuning parameters found. ForprofL1
andprofL2
lambda
is the vector of values of the tuning parameter for which the cross-validated likelihood has been calculated. Absent in the output ofcvl
.cvl
:The value(s) of the cross-validated likelihood. For
optL1
,optL2
this is the cross-validated likelihood at the optimal value of the tuning parameter.fold
:Returns the precise allocation of the subjects into the cross-validation folds. Note that the same allocation is used for all cross-validated likelihood calculations in each call to
optL1
,optL2
,profL1
,profL2
.predictions
:The cross-validated predictions for the left-out samples. The precise format of the cross-validated predictions depends on the type of generalized linear model (see
breslow
for survival models. The functionsprofL1
andprofL2
return a list here (only ifsave.predictions = TRUE
), whereasoptL1
,optL2
return the predictions for the optimal value of the tuning parameter only.fullfit
:The fitted model on the full data. The functions
profL1
andprofL2
return a list ofpenfit
objects here, whereasoptL1
,optL2
return the full data fit (a singlepenfit
object) for the optimal value of the tuning parameter only.
Value
A named list. See details.
Note
The optL1
and optL2
functions use Brent's algorithm for minimization without derivatives (see also optimize
). There is a risk that these functions converge to a local instead of to a global optimum. This is especially the case for optL1
, as the cross-validated likelihood as a function of lambda1
quite often has local optima. It is recommended to use optL1
in combination with profL1
to check whether optL1
has converged to the right optimum.
See also the notes under penalized
.
Author(s)
Jelle Goeman: j.j.goeman@lumc.nl
References
Goeman J.J. (2010). L-1 Penalized Estimation in the Cox Proportional Hazards Model. Biometrical Journal 52 (1) 70-84.
See Also
Examples
# More examples in the package vignette:
# type vignette("penalized")
data(nki70)
attach(nki70)
# Finding an optimal cross-validated likelihood
opt <- optL1(Surv(time, event), penalized = nki70[,8:77], fold = 5)
coefficients(opt$fullfit)
plot(opt$predictions)
# Plotting the profile of the cross-validated likelihood
prof <- profL1(Surv(time, event), penalized = nki70[,8:77],
fold = opt$fold, steps=10)
plot(prof$lambda, prof$cvl, type="l")
plotpath(prof$fullfit)