glmnet {glmnet}  R Documentation 
fit a GLM with lasso or elasticnet regularization
Description
Fit a generalized linear model via penalized maximum likelihood. The regularization path is computed for the lasso or elasticnet penalty at a grid of values for the regularization parameter lambda. Can deal with all shapes of data, including very large sparse data matrices. Fits linear, logistic and multinomial, poisson, and Cox regression models.
Usage
glmnet(
x,
y,
family = c("gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"),
weights = NULL,
offset = NULL,
alpha = 1,
nlambda = 100,
lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e04),
lambda = NULL,
standardize = TRUE,
intercept = TRUE,
thresh = 1e07,
dfmax = nvars + 1,
pmax = min(dfmax * 2 + 20, nvars),
exclude = NULL,
penalty.factor = rep(1, nvars),
lower.limits = Inf,
upper.limits = Inf,
maxit = 1e+05,
type.gaussian = ifelse(nvars < 500, "covariance", "naive"),
type.logistic = c("Newton", "modified.Newton"),
standardize.response = FALSE,
type.multinomial = c("ungrouped", "grouped"),
relax = FALSE,
trace.it = 0,
...
)
relax.glmnet(fit, x, ..., maxp = n  3, path = FALSE, check.args = TRUE)
Arguments
x 
input matrix, of dimension nobs x nvars; each row is an observation
vector. Can be in sparse matrix format (inherit from class

y 
response variable. Quantitative for 
family 
Either a character string representing
one of the builtin families, or else a 
weights 
observation weights. Can be total counts if responses are proportion matrices. Default is 1 for each observation 
offset 
A vector of length 
alpha 
The elasticnet mixing parameter, with

nlambda 
The number of 
lambda.min.ratio 
Smallest value for 
lambda 
A user supplied 
standardize 
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on the
original scale. Default is 
intercept 
Should intercept(s) be fitted (default=TRUE) or set to zero (FALSE) 
thresh 
Convergence threshold for coordinate descent. Each inner
coordinatedescent loop continues until the maximum change in the objective
after any coefficient update is less than 
dfmax 
Limit the maximum number of variables in the model. Useful for
very large 
pmax 
Limit the maximum number of variables ever to be nonzero 
exclude 
Indices of variables to be excluded from the model. Default
is none. Equivalent to an infinite penalty factor for the variables excluded (next item).
Users can supply instead an 
penalty.factor 
Separate penalty factors can be applied to each
coefficient. This is a number that multiplies 
lower.limits 
Vector of lower limits for each coefficient; default

upper.limits 
Vector of upper limits for each coefficient; default

maxit 
Maximum number of passes over the data for all lambda values; default is 10^5. 
type.gaussian 
Two algorithm types are supported for (only)

type.logistic 
If 
standardize.response 
This is for the 
type.multinomial 
If 
relax 
If 
trace.it 
If 
... 
Additional argument used in 
fit 
For 
maxp 
a limit on how many relaxed coefficients are allowed. Default is 'n3', where 'n' is the sample size. This may not be sufficient for nongaussian familes, in which case users should supply a smaller value. This argument can be supplied directly to 'glmnet'. 
path 
Since 
check.args 
Should 
Details
The sequence of models implied by lambda
is fit by coordinate
descent. For family="gaussian"
this is the lasso sequence if
alpha=1
, else it is the elasticnet sequence.
The objective function for "gaussian"
is
1/2 RSS/nobs +
\lambda*penalty,
and for the other models it is
loglik/nobs +
\lambda*penalty.
Note also that for "gaussian"
, glmnet
standardizes y to have unit variance (using 1/n rather than 1/(n1) formula)
before computing its lambda sequence (and then unstandardizes the resulting
coefficients); if you wish to reproduce/compare results with other software,
best to supply a standardized y. The coefficients for any predictor
variables with zero variance are set to zero for all values of lambda.
Details on family
option
From version 4.0 onwards, glmnet supports both the original builtin families,
as well as any family object as used by stats:glm()
.
This opens the door to a wide variety of additional models. For example
family=binomial(link=cloglog)
or family=negative.binomial(theta=1.5)
(from the MASS library).
Note that the code runs faster for the builtin families.
The built in families are specifed via a character string. For all families,
the object produced is a lasso or elasticnet regularization path for fitting the
generalized linear regression paths, by maximizing the appropriate penalized
loglikelihood (partial likelihood for the "cox" model). Sometimes the
sequence is truncated before nlambda
values of lambda
have
been used, because of instabilities in the inverse link functions near a
saturated fit. glmnet(...,family="binomial")
fits a traditional
logistic regression model for the logodds.
glmnet(...,family="multinomial")
fits a symmetric multinomial model,
where each class is represented by a linear model (on the logscale). The
penalties take care of redundancies. A twoclass "multinomial"
model
will produce the same fit as the corresponding "binomial"
model,
except the pair of coefficient matrices will be equal in magnitude and
opposite in sign, and half the "binomial"
values.
Two useful additional families are the family="mgaussian"
family and
the type.multinomial="grouped"
option for multinomial fitting. The
former allows a multiresponse gaussian model to be fit, using a "group
lasso" penalty on the coefficients for each variable. Tying the responses
together like this is called "multitask" learning in some domains. The
grouped multinomial allows the same penalty for the
family="multinomial"
model, which is also multiresponsed. For both
of these the penalty on the coefficient vector for variable j is
(1\alpha)/2\beta_j_2^2+\alpha\beta_j_2.
When alpha=1
this is a grouplasso penalty, and otherwise it mixes with quadratic just
like elasticnet. A small detail in the Cox model: if death times are tied
with censored times, we assume the censored times occurred just
before the death times in computing the Breslow approximation; if
users prefer the usual convention of after, they can add a small
number to all censoring times to achieve this effect.
Details on response for family="cox"
For Cox models, the response should preferably be a Surv
object,
created by the Surv()
function in survival package. For
rightcensored data, this object should have type "right", and for
(start, stop] data, it should have type "counting". To fit stratified Cox
models, strata should be added to the response via the stratifySurv()
function before passing the response to glmnet()
. (For backward
compatibility, rightcensored data can also be passed as a
twocolumn matrix with columns named 'time' and 'status'. The
latter is a binary variable, with '1' indicating death, and '0' indicating
right censored.)
Details on relax
option
If relax=TRUE
a duplicate sequence of models is produced, where each active set in the
elasticnet path is refit without regularization. The result of this is a
matching "glmnet"
object which is stored on the original object in a
component named "relaxed"
, and is part of the glmnet output.
Generally users will not call relax.glmnet
directly, unless the
original 'glmnet' object took a long time to fit. But if they do, they must
supply the fit, and all the original arguments used to create that fit. They
can limit the length of the relaxed path via 'maxp'.
Value
An object with S3 class "glmnet","*"
, where "*"
is
"elnet"
, "lognet"
, "multnet"
, "fishnet"
(poisson), "coxnet"
or "mrelnet"
for the various types of
models. If the model was created with relax=TRUE
then this class has
a prefix class of "relaxed"
.
call 
the call that produced this object 
a0 
Intercept sequence of length 
beta 
For 
lambda 
The actual sequence of 
dev.ratio 
The
fraction of (null) deviance explained (for 
nulldev 
Null deviance (per observation). This is defined to be 2*(loglike_sat loglike(Null)); The NULL model refers to the intercept model, except for the Cox, where it is the 0 model. 
df 
The number of
nonzero coefficients for each value of 
dfmat 
For 
dim 
dimension of coefficient matrix (ices) 
nobs 
number of observations 
npasses 
total passes over the data summed over all lambda values 
offset 
a logical variable indicating whether an offset was included in the model 
jerr 
error flag, for warnings and errors (largely for internal debugging). 
relaxed 
If 
Author(s)
Jerome Friedman, Trevor Hastie, Balasubramanian Narasimhan, Noah
Simon, Kenneth Tay and Rob Tibshirani
Maintainer: Trevor Hastie
hastie@stanford.edu
References
Friedman, J., Hastie, T. and Tibshirani, R. (2008)
Regularization Paths for Generalized Linear Models via Coordinate
Descent (2010), Journal of Statistical Software, Vol. 33(1), 122,
doi:10.18637/jss.v033.i01.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011)
Regularization Paths for Cox's Proportional
Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol.
39(5), 113,
doi:10.18637/jss.v039.i05.
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N.,Taylor, J. and
Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in
Lassotype Problems, JRSSB, Vol. 74(2), 245266,
https://arxiv.org/abs/1011.2234.
Hastie, T., Tibshirani, Robert and Tibshirani, Ryan (2020) Best Subset,
Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons,
Statist. Sc. Vol. 35(4), 579592,
https://arxiv.org/abs/1707.08692.
Glmnet webpage with four vignettes: https://glmnet.stanford.edu.
See Also
print
, predict
, coef
and plot
methods,
and the cv.glmnet
function.
Examples
# Gaussian
x = matrix(rnorm(100 * 20), 100, 20)
y = rnorm(100)
fit1 = glmnet(x, y)
print(fit1)
coef(fit1, s = 0.01) # extract coefficients at a single value of lambda
predict(fit1, newx = x[1:10, ], s = c(0.01, 0.005)) # make predictions
# Relaxed
fit1r = glmnet(x, y, relax = TRUE) # can be used with any model
# multivariate gaussian
y = matrix(rnorm(100 * 3), 100, 3)
fit1m = glmnet(x, y, family = "mgaussian")
plot(fit1m, type.coef = "2norm")
# binomial
g2 = sample(c(0,1), 100, replace = TRUE)
fit2 = glmnet(x, g2, family = "binomial")
fit2n = glmnet(x, g2, family = binomial(link=cloglog))
fit2r = glmnet(x,g2, family = "binomial", relax=TRUE)
fit2rp = glmnet(x,g2, family = "binomial", relax=TRUE, path=TRUE)
# multinomial
g4 = sample(1:4, 100, replace = TRUE)
fit3 = glmnet(x, g4, family = "multinomial")
fit3a = glmnet(x, g4, family = "multinomial", type.multinomial = "grouped")
# poisson
N = 500
p = 20
nzc = 5
x = matrix(rnorm(N * p), N, p)
beta = rnorm(nzc)
f = x[, seq(nzc)] %*% beta
mu = exp(f)
y = rpois(N, mu)
fit = glmnet(x, y, family = "poisson")
plot(fit)
pfit = predict(fit, x, s = 0.001, type = "response")
plot(pfit, y)
# Cox
set.seed(10101)
N = 1000
p = 30
nzc = p/3
x = matrix(rnorm(N * p), N, p)
beta = rnorm(nzc)
fx = x[, seq(nzc)] %*% beta/3
hx = exp(fx)
ty = rexp(N, hx)
tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator
y = cbind(time = ty, status = 1  tcens) # y=Surv(ty,1tcens) with library(survival)
fit = glmnet(x, y, family = "cox")
plot(fit)
# Cox example with (start, stop] data
set.seed(2)
nobs < 100; nvars < 15
xvec < rnorm(nobs * nvars)
xvec[sample.int(nobs * nvars, size = 0.4 * nobs * nvars)] < 0
x < matrix(xvec, nrow = nobs)
start_time < runif(100, min = 0, max = 5)
stop_time < start_time + runif(100, min = 0.1, max = 3)
status < rbinom(n = nobs, prob = 0.3, size = 1)
jsurv_ss < survival::Surv(start_time, stop_time, status)
fit < glmnet(x, jsurv_ss, family = "cox")
# Cox example with strata
jsurv_ss2 < stratifySurv(jsurv_ss, rep(1:2, each = 50))
fit < glmnet(x, jsurv_ss2, family = "cox")
# Sparse
n = 10000
p = 200
nzc = trunc(p/10)
x = matrix(rnorm(n * p), n, p)
iz = sample(1:(n * p), size = n * p * 0.85, replace = FALSE)
x[iz] = 0
sx = Matrix(x, sparse = TRUE)
inherits(sx, "sparseMatrix") #confirm that it is sparse
beta = rnorm(nzc)
fx = x[, seq(nzc)] %*% beta
eps = rnorm(n)
y = fx + eps
px = exp(fx)
px = px/(1 + px)
ly = rbinom(n = length(px), prob = px, size = 1)
system.time(fit1 < glmnet(sx, y))
system.time(fit2n < glmnet(x, y))