R: Fitting of regularized linear models

cv_glmnet {goffda}

R Documentation

Fitting of regularized linear models

Description

Convenience function for fitting multivariate linear models with multivariate response by relying on cv.glmnet from the glmnet-package. The function fits the multivariate linear model

\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},

where \mathbf{X} is a p-dimensional vector, \mathbf{Y} and \mathbf{E} are two q-dimensional vectors, and \mathbf{B} is a p\times q matrix.

If \mathbf{X} and \mathbf{Y} are centered (i.e., have zero-mean columns), the function estimates \mathbf{B} by solving, for the sample (\mathbf{X}_1, \mathbf{Y}_1), \ldots, (\mathbf{X}_n, \mathbf{Y}_n), the elastic-net optimization problem

\min_{\mathbf{B}\in R^{q \times p}} \frac{1}{2n}\sum_{i=1}^n \|\mathbf{Y}_i-\mathbf{X}_i\mathbf{B}\|^2 + \lambda\left[(1-\alpha)\|\mathbf{B}\|_\mathrm{F}^2 / 2 + \alpha \sum_{j=1}^p \|\mathbf{B}_j\|_2\right],

where \|\mathbf{B}\|_\mathrm{F} stands for the Frobenious norm of the matrix \mathbf{B} and \|\mathbf{B}_j\|_2 for the Euclidean norm of the j-th row of \mathbf{B}. The choice \alpha = 0 in the elastic-net penalization corresponds to ridge regression, whereas \alpha = 1 yields a lasso-type estimator. The unpenalized least-squares estimator is obtained with \lambda = 0.

Usage

cv_glmnet(x, y, alpha = c("lasso", "ridge")[1], lambda = NULL,
  intercept = TRUE, thresh = 1e-10, cv_1se = TRUE, cv_nlambda = 50,
  cv_folds = NULL, cv_grouped = FALSE, cv_lambda = 10^seq(2, -3,
  length.out = cv_nlambda), cv_second = TRUE, cv_tol_second = 0.025,
  cv_log10_exp = c(-0.5, 3), cv_thresh = 1e-05, cv_parallel = FALSE,
  cv_verbose = FALSE, ...)

Arguments

`x`	input matrix of size `c(n, p)`, or a vector of length `n`.
`y`	response matrix of size `c(n, q)`, or a vector of length `n`.
`alpha`	elastic net mixing argument in `glmnet`, with `0 \le \alpha \le 1`. Alternatively, a character string indicating whether the `"ridge"` (`\alpha = 0`) or `"lasso"` (`\alpha = 1`) fit is to be performed.
`lambda`	scalar giving the regularization parameter `\lambda`. If `NULL` (default), the optimal `\lambda` is searched by cross-validation. If `lambda` is provided, then cross-validation is skipped and the fit is performed for the given `lambda`.
`intercept`	flag passed to the `intercept` argument in `glmnet` to indicate if the intercept should be fitted (default; does not assume that the data is centered) or set to zero (the optimization problem above is solved as-is). Defaults to `TRUE`.
`thresh`	convergence threshold of the coordinate descending algorithm, passed to the `thresh` argument in `glmnet`. Defaults to `1e-10`.
`cv_1se`	shall the optimal lambda be the `lambda.1se`, as returned by `cv.glmnet`? This favors sparser fits. If `FALSE`, then the optimal lambda is `lambda.min`, the minimizer of the cross-validation loss. Defaults to `TRUE`.
`cv_nlambda`	the length of the sequence of `\lambda` values, passed to the `nlambda` argument in `cv.glmnet` for the cross-validation search. Defaults to `50`.
`cv_folds`	number of folds to perform cross-validation. If `NULL` (the default), then `cv_folds <- n` internally, that is, leave-one-out cross-validation is performed. Passed to the `nfolds` argument in `cv.glmnet`.
`cv_grouped`	passed to the `grouped` argument in `cv.glmnet`. Defaults to `FALSE`.
`cv_lambda`	passed to the `lambda` argument in `cv.glmnet`. Defaults to `10^seq(2, -3, length.out = cv_nlambda)`.
`cv_second`	flag to perform a second cross-validation search if the optimal `\lambda` was found at the extremes of the first `\lambda` sequence (indicating that the minimum may not be reliable). Defaults to `TRUE`.
`cv_tol_second`	tolerance for performing a second search if `second = TRUE`. If the minimum is found at the `100 * cv_tol_second` lower/upper percentile of search interval, then the search interval is expanded for a second search. Defaults to `0.025`.
`cv_log10_exp`	expansion of the `\lambda` sequence if the minimum is found close to its upper extreme. If that is the case, the sequence for the is set as `10^(log10(lambda_min) + cv_log10_exp)`, where `lambda_min` is the minimum obtained in the first sequence. If the minimum is found close to the lower extreme of the sequence, then `-rev(cv_log10_exp)` is considered. Defaults to `c(-0.5, 5)`.
`cv_thresh`	convergence threshold used during cross-validation in `cv.glmnet`. Defaults to `1e-5`.
`cv_parallel`	passed to the `parallel` argument in `cv.glmnet`. Defaults to `FALSE`.
`cv_verbose`	flag to display information about the cross-validation search with plots and messages. More useful if `second = TRUE`. Defaults to `FALSE`.
`...`	further parameters to be passed to `glmnet` to perform the final model fit.

Details

If \alpha = 1, then the lasso-type fit shrinks to zero, simultaneously, all the elements of certain rows of \mathbf{B}, thus helping the selection of the p most influential variables in \mathbf{X} for explaining/predicting \mathbf{Y}.

The function first performs a cross-validation search for the optimal \lambda if lambda = NULL (using cv_thresh to control the convergence threshold). After the optimal penalization parameter is determined, a second fit (now with convergence threshold thresh) using the default \lambda sequence in glmnet is performed. The final estimate is obtained via predict.glmnet from the optimal \lambda determined in the first step.

Due to its cross-validatory nature, cv_glmnet can be computationally demanding. Approaches for reducing the computation time include: considering a smaller number of folds than n, such as cv_folds = 10 (but will lead to random partitions of the data); decrease the tolerance of the coordinate descending algorithm by increasing cv_thresh; reducing the number of candidate \lambda values with nlambda; setting second = FALSE to avoid a second cross-validation; or considering cv_parallel = TRUE to use a parallel backend (must be registered before hand; see examples).

By default, the \lambda sequence is used with standardized \mathbf{X} and \mathbf{Y} (both divided by their columnwise variances); see glmnet and the standardized argument. Therefore, the optimal selected \lambda value assumes standardization and must be used with care if the variables are not standardized. For example, when using the ridge analytical solution, a prior change of scale that depends on q needs to be done. See the examples for the details.

Value

A list with the following entries:

`beta_hat`	the estimated `\mathbf{B}`, a matrix of size `c(p, q)`.
`lambda`	the optimal `\lambda` obtained by cross-validation and according to `cv_1se`.
`cv`	if `lambda = NULL`, the result of the cross-validation search for the optimal `\lambda`. Otherwise, `NULL`.
`fit`	the `glmnet` fit, computed with `thresh` threshold and with an automatically chosen `\lambda` sequence.

Author(s)

Eduardo García-Portugués. Initial contributions by Gonzalo Álvarez-Pérez.

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22. doi:10.18637/jss.v033.i01

Examples

## Quick example for multivariate linear model with multivariate response

# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e

# Fit lasso (model with intercept, the default)
cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)$beta_hat

## Multivariate linear model with multivariate response

# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e

# Fit ridge
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
                 cv_verbose = TRUE)
cv0$beta_hat

# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat

# Use cv_1se = FALSE
cv1_min <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE,
                     cv_1se = FALSE)

# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data. Note also
# that, differently to the case q = 1, no standardarization with respect to
# y happens
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          thresh = 1e-20, intercept = FALSE)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 * n)) %*% t(x) %*% y

# If x is standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
cv_glmnet(x = x_std, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y

## Simple linear model

# Simulate data
n <- 100
p <- 1; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- 2
y <- x * beta + e

# Fit by ridge (model with intercept, the default)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE)
cv0$beta_hat
cv0$intercept

# Comparison with linear model with intercept
lm(y ~ 1 + x)$coefficients

# Fit by ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE,
                 intercept = FALSE)
cv0$beta_hat

# Comparison with linear model without intercept
lm(y ~ 0 + x)$coefficients

# Same fit for the chosen lambda (and without intercept)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Same for lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso")
cv1$beta_hat

## Multivariate linear model (p = 3, q = 1)

# Simulate data
n <- 50
p <- 10; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p, mean = 1, sd = rep(1:p, each = n)),
            nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- ((1:p - 1) / p)^2
y <- x %*% beta + e

# Fit ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
                 cv_verbose = TRUE)
cv0$beta_hat

# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
sd_y <- sd(y) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 / sd_y * n)) %*% t(x) %*% y

# If x and y are standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
y_std <- scale(y, scale = sd_y, center = TRUE)
cv_glmnet(x = x_std, y = y_std, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y_std

# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat

# ## Parallelization
#
# # Parallel
# doMC::registerDoMC(cores = 2)
# microbenchmark::microbenchmark(
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = TRUE),
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = FALSE),
# times = 10)

[Package goffda version 0.1.2 Index]