R: Lasso with penalty parameter selection

lasso {perryExamples}

R Documentation

Lasso with penalty parameter selection

Description

Fit lasso models and select the penalty parameter by estimating the respective prediction error via (repeated) K-fold cross-validation, (repeated) random splitting (also known as random subsampling or Monte Carlo cross-validation), or the bootstrap.

Usage

lasso(
  x,
  y,
  lambda = seq(1, 0, length.out = 50),
  mode = c("fraction", "lambda"),
  standardize = TRUE,
  intercept = TRUE,
  splits = foldControl(),
  cost = rmspe,
  selectBest = c("hastie", "min"),
  seFactor = 1,
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)

lasso.fit(
  x,
  y,
  lambda = seq(1, 0, length.out = 50),
  mode = c("fraction", "lambda"),
  standardize = TRUE,
  intercept = TRUE,
  ...
)

Arguments

`x`	a numeric matrix containing the predictor variables.
`y`	a numeric vector containing the response variable.
`lambda`	for `lasso`, a numeric vector of non-negative values to be used as penalty parameter. For `lasso.fit`, a single non-negative value to be used as penalty parameter.
`mode`	a character string specifying the type of penalty parameter. If `"fraction"`, `lambda` gives the fractions of the smallest value of the penalty parameter that sets all coefficients to 0 (hence all values of `lambda` should be in the interval [0,1] in that case). If `"lambda"`, `lambda` gives the grid of values for the penalty parameter directly.
`standardize`	a logical indicating whether the predictor variables should be standardized to have unit variance (the default is `TRUE`).
`intercept`	a logical indicating whether a constant term should be included in the model (the default is `TRUE`).
`splits`	an object giving data splits to be used for prediction error estimation (see `perryTuning`).
`cost`	a cost function measuring prediction loss (see `perryTuning` for some requirements). The default is to use the root mean squared prediction error (see `cost`).
`selectBest`, `seFactor`	arguments specifying a criterion for selecting the best model (see `perryTuning`). The default is to use a one-standard-error rule.
`ncores`, `cl`	arguments for parallel computing (see `perryTuning`).
`seed`	optional initial seed for the random number generator (see `.Random.seed` and `perryTuning`).
`...`	for `lasso`, additional arguments to be passed to the prediction loss function `cost`. For `lasso.fit`, additional arguments to be passed to `lars`.

Value

For lasso, an object of class "perryTuning", see perryTuning). It contains information on the prediction error criterion, and includes the final model with the optimal tuning paramter as component finalModel.

For lasso.fit, an object of class lasso with the following components:

lambda: numeric; the value of the penalty parameter.
coefficients: a numeric vector containing the coefficient estimates.
fitted.values: a numeric vector containing the fitted values.
residuals: a numeric vector containing the residuals.
standardize: a logical indicating whether the predictor variables were standardized to have unit variance.
intercept: a logical indicating whether the model includes a constant term.
muX: a numeric vector containing the means of the predictors.
sigmaX: a numeric vector containing the standard deviations of the predictors.
mu: numeric; the mean of the response.
call: the matched function call.

Author(s)

Andreas Alfons

References

Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267–288.

Examples

## load data
data("Bundesliga")
Bundesliga <- Bundesliga[, -(1:2)]
f <- log(MarketValue) ~ Age + I(Age^2) + .
mf <- model.frame(f, data=Bundesliga)
x <- model.matrix(terms(mf), mf)[, -1]
y <- model.response(mf)

## set up repeated random splits
splits <- splitControl(m = 40, R = 10)

## select optimal penalty parameter
fit <- lasso(x, y, splits = splits, seed = 2014)
fit

## plot prediction error results
plot(fit, method = "line")

[Package perryExamples version 0.1.1 Index]