cv_glmnet {goffda} | R Documentation |
Fitting of regularized linear models
Description
Convenience function for fitting multivariate linear models
with multivariate response by relying on cv.glmnet
from the glmnet-package
. The function fits the
multivariate linear model
\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},
where \mathbf{X}
is a p
-dimensional vector,
\mathbf{Y}
and \mathbf{E}
are two
q
-dimensional vectors, and \mathbf{B}
is a
p\times q
matrix.
If \mathbf{X}
and \mathbf{Y}
are centered
(i.e., have zero-mean columns), the function estimates \mathbf{B}
by solving, for the sample
(\mathbf{X}_1, \mathbf{Y}_1), \ldots, (\mathbf{X}_n, \mathbf{Y}_n)
, the elastic-net optimization problem
\min_{\mathbf{B}\in R^{q \times p}}
\frac{1}{2n}\sum_{i=1}^n
\|\mathbf{Y}_i-\mathbf{X}_i\mathbf{B}\|^2 +
\lambda\left[(1-\alpha)\|\mathbf{B}\|_\mathrm{F}^2 / 2 +
\alpha \sum_{j=1}^p \|\mathbf{B}_j\|_2\right],
where \|\mathbf{B}\|_\mathrm{F}
stands for
the Frobenious norm of the matrix \mathbf{B}
and
\|\mathbf{B}_j\|_2
for the Euclidean norm
of the j
-th row of \mathbf{B}
. The choice
\alpha = 0
in the elastic-net penalization corresponds to ridge
regression, whereas \alpha = 1
yields a lasso-type estimator.
The unpenalized least-squares estimator is obtained with \lambda = 0
.
Usage
cv_glmnet(x, y, alpha = c("lasso", "ridge")[1], lambda = NULL,
intercept = TRUE, thresh = 1e-10, cv_1se = TRUE, cv_nlambda = 50,
cv_folds = NULL, cv_grouped = FALSE, cv_lambda = 10^seq(2, -3,
length.out = cv_nlambda), cv_second = TRUE, cv_tol_second = 0.025,
cv_log10_exp = c(-0.5, 3), cv_thresh = 1e-05, cv_parallel = FALSE,
cv_verbose = FALSE, ...)
Arguments
x |
input matrix of size |
y |
response matrix of size |
alpha |
elastic net mixing argument in |
lambda |
scalar giving the regularization parameter |
intercept |
flag passed to the |
thresh |
convergence threshold of the coordinate descending algorithm,
passed to the |
cv_1se |
shall the optimal lambda be the |
cv_nlambda |
the length of the sequence of |
cv_folds |
number of folds to perform cross-validation. If
|
cv_grouped |
passed to the |
cv_lambda |
passed to the |
cv_second |
flag to perform a second cross-validation search if the
optimal |
cv_tol_second |
tolerance for performing a second search if
|
cv_log10_exp |
expansion of the |
cv_thresh |
convergence threshold used during cross-validation in
|
cv_parallel |
passed to the |
cv_verbose |
flag to display information about the cross-validation
search with plots and messages. More useful if |
... |
further parameters to be passed to |
Details
If \alpha = 1
, then the lasso-type fit shrinks to zero,
simultaneously, all the elements of certain rows of
\mathbf{B}
, thus
helping the selection of the p
most influential variables in
\mathbf{X}
for explaining/predicting \mathbf{Y}
.
The function first performs a cross-validation search for the optimal
\lambda
if lambda = NULL
(using cv_thresh
to control
the convergence threshold). After the optimal penalization parameter is
determined, a second fit (now with convergence threshold thresh
)
using the default \lambda
sequence in glmnet
is performed. The final estimate is obtained via
predict.glmnet
from the optimal \lambda
determined in the first step.
Due to its cross-validatory nature, cv_glmnet
can be
computationally demanding. Approaches for reducing the computation time
include: considering a smaller number of folds than n
, such as
cv_folds = 10
(but will lead to random partitions of the
data); decrease the tolerance of the coordinate descending
algorithm by increasing cv_thresh
; reducing the number of
candidate \lambda
values with nlambda
; setting
second = FALSE
to avoid a second cross-validation; or considering
cv_parallel = TRUE
to use a parallel backend (must be registered
before hand; see examples).
By default, the \lambda
sequence is used with standardized
\mathbf{X}
and \mathbf{Y}
(both divided by their
columnwise variances); see glmnet
and the
standardized
argument. Therefore, the optimal selected \lambda
value assumes standardization and must be used with care if the variables
are not standardized. For example, when using the ridge analytical
solution, a prior change of scale that depends on q
needs to be done.
See the examples for the details.
Value
A list with the following entries:
beta_hat |
the estimated |
lambda |
the optimal |
cv |
if |
fit |
the |
Author(s)
Eduardo García-Portugués. Initial contributions by Gonzalo Álvarez-Pérez.
References
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22. doi:10.18637/jss.v033.i01
Examples
## Quick example for multivariate linear model with multivariate response
# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e
# Fit lasso (model with intercept, the default)
cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)$beta_hat
## Multivariate linear model with multivariate response
# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e
# Fit ridge
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
cv_verbose = TRUE)
cv0$beta_hat
# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
intercept = FALSE)$beta_hat
# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat
# Use cv_1se = FALSE
cv1_min <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE,
cv_1se = FALSE)
# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data. Note also
# that, differently to the case q = 1, no standardarization with respect to
# y happens
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
thresh = 1e-20, intercept = FALSE)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 * n)) %*% t(x) %*% y
# If x is standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
cv_glmnet(x = x_std, y = y, alpha = "ridge", lambda = cv0$lambda,
intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y
## Simple linear model
# Simulate data
n <- 100
p <- 1; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- 2
y <- x * beta + e
# Fit by ridge (model with intercept, the default)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE)
cv0$beta_hat
cv0$intercept
# Comparison with linear model with intercept
lm(y ~ 1 + x)$coefficients
# Fit by ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE,
intercept = FALSE)
cv0$beta_hat
# Comparison with linear model without intercept
lm(y ~ 0 + x)$coefficients
# Same fit for the chosen lambda (and without intercept)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
intercept = FALSE)$beta_hat
# Same for lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso")
cv1$beta_hat
## Multivariate linear model (p = 3, q = 1)
# Simulate data
n <- 50
p <- 10; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p, mean = 1, sd = rep(1:p, each = n)),
nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- ((1:p - 1) / p)^2
y <- x %*% beta + e
# Fit ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
cv_verbose = TRUE)
cv0$beta_hat
# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
intercept = FALSE)$beta_hat
# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
sd_y <- sd(y) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 / sd_y * n)) %*% t(x) %*% y
# If x and y are standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
y_std <- scale(y, scale = sd_y, center = TRUE)
cv_glmnet(x = x_std, y = y_std, alpha = "ridge", lambda = cv0$lambda,
intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y_std
# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat
# ## Parallelization
#
# # Parallel
# doMC::registerDoMC(cores = 2)
# microbenchmark::microbenchmark(
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = TRUE),
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = FALSE),
# times = 10)