R: Fit a linear model with elastic-net regularization

elastic.net {quadrupen}

R Documentation

Fit a linear model with elastic-net regularization

Description

Adjust a linear model with elastic-net regularization, mixing a (possibly weighted) \ell_1-norm (LASSO) and a (possibly structured) \ell_2-norm (ridge-like). The solution path is computed at a grid of values for the \ell_1-penalty, fixing the amount of \ell_2 regularization. See details for the criterion optimized.

Usage

elastic.net(
  x,
  y,
  lambda1 = NULL,
  lambda2 = 0.01,
  penscale = rep(1, p),
  struct = NULL,
  intercept = TRUE,
  normalize = TRUE,
  naive = FALSE,
  nlambda1 = ifelse(is.null(lambda1), 100, length(lambda1)),
  min.ratio = ifelse(n <= p, 0.01, 1e-04),
  max.feat = ifelse(lambda2 < 0.01, min(n, p), min(4 * n, p)),
  beta0 = NULL,
  control = list(),
  checkargs = TRUE
)

Arguments

`x`	matrix of features, possibly sparsely encoded (experimental). Do NOT include intercept. When normalized os `TRUE`, coefficients will then be rescaled to the original scale.
`y`	response vector.
`lambda1`	sequence of decreasing `\ell_1`-penalty levels. If `NULL` (the default), a vector is generated with `nlambda1` entries, starting from a guessed level `lambda1.max` where only the intercept is included, then shrunken to `min.ratio*lambda1.max`.
`lambda2`	real scalar; tunes the `\ell_2` penalty in the Elastic-net. Default is 0.01. Set to 0 to recover the Lasso.
`penscale`	vector with real positive values that weight the `\ell_1`-penalty of each feature. Default set all weights to 1.
`struct`	matrix structuring the coefficients (preferably sparse). Must be at least positive semidefinite (this is checked internally if the `checkarg` argument is `TRUE`). The default uses the identity matrix. See details below.
`intercept`	logical; indicates if an intercept should be included in the model. Default is `TRUE`.
`normalize`	logical; indicates if variables should be normalized to have unit L2 norm before fitting. Default is `TRUE`.
`naive`	logical; Compute either 'naive' of classic elastic-net as defined in Zou and Hastie (2006): the vector of parameters is rescaled by a coefficient `(1+lambda2)` when `naive` equals `FALSE`. No rescaling otherwise. Default is `FALSE`.
`nlambda1`	integer that indicates the number of values to put in the `lambda1` vector. Ignored if `lambda1` is provided.
`min.ratio`	minimal value of `\ell_1`-part of the penalty that will be tried, as a fraction of the maximal `lambda1` value. A too small value might lead to unstability at the end of the solution path corresponding to small `lambda1` combined with `\lambda_2=0`. The default value tries to avoid this, adapting to the '`n<p`' context. Ignored if `lambda1` is provided.
`max.feat`	integer; limits the number of features ever to enter the model; i.e., non-zero coefficients for the Elastic-net: the algorithm stops if this number is exceeded and `lambda1` is cut at the corresponding level. Default is `min(nrow(x),ncol(x))` for small `lambda2` (<0.01) and `min(4*nrow(x),ncol(x))` otherwise. Use with care, as it considerably changes the computation time.
`beta0`	a starting point for the vector of parameter. When `NULL` (the default), will be initialized at zero. May save time in some situation.
`control`	list of argument controlling low level options of the algorithm –use with care and at your own risk– : `verbose`: integer; activate verbose mode –this one is not too much risky!– set to `0` for no output; `1` for warnings only, and `2` for tracing the whole progression. Default is `1`. Automatically set to `0` when the method is embedded within cross-validation or stability selection. `timer`: logical; use to record the timing of the algorithm. Default is `FALSE`. `max.iter`: the maximal number of iteration used to solve the problem for a given value of lambda1. Default is 500. `method`: a string for the underlying solver used. Either `"quadra"`, `"pathwise"` or `"fista"`. Default is `"quadra"`. `threshold`: a threshold for convergence. The algorithm stops when the optimality conditions are fulfill up to this threshold. Default is `1e-7` for `"quadra"` and `1e-2` for the first order methods. `monitor`: indicates if a monitoring of the convergence should be recorded, by computing a lower bound between the current solution and the optimum: when `'0'` (the default), no monitoring is provided; when `'1'`, the bound derived in Grandvalet et al. is computed; when `'>1'`, the Fenchel duality gap is computed along the algorithm.
`checkargs`	logical; should arguments be checked to (hopefully) avoid internal crashes? Default is `TRUE`. Automatically set to `FALSE` when calls are made from cross-validation or stability selection procedures.

Details

The optimized criterion is the following: β^hat_λ₁,λ₂ = argmin_β 1/2 RSS(β) + λ₁ | D β |₁ + λ/2 ₂ β^T S β, where D is a diagonal matrix, whose diagonal terms are provided as a vector by the penscale argument. The \ell_2 structuring matrix S is provided via the struct argument, a positive semidefinite matrix (possibly of class Matrix).

Value

an object with class quadrupen, see the documentation page quadrupen for details.

Examples

## Simulating multivariate Gaussian with blockwise correlation
## and piecewise constant vector of parameters
beta <- rep(c(0,1,0,-1,0), c(25,10,25,10,25))
cor <- 0.75
Soo <- toeplitz(cor^(0:(25-1))) ## Toeplitz correlation for irrelevant variables
Sww  <- matrix(cor,10,10) ## bloc correlation between active variables
Sigma <- bdiag(Soo,Sww,Soo,Sww,Soo)
diag(Sigma) <- 1
n <- 50
x <- as.matrix(matrix(rnorm(95*n),n,95) %*% chol(Sigma))
y <- 10 + x %*% beta + rnorm(n,0,10)

labels <- rep("irrelevant", length(beta))
labels[beta != 0] <- "relevant"
## Comparing the solution path of the LASSO and the Elastic-net
plot(elastic.net(x,y,lambda2=0), label=labels) ## a mess
plot(elastic.net(x,y,lambda2=10), label=labels) ## a lot better

[Package quadrupen version 0.2-12 Index]