bolasso {bolasso}R Documentation

Bootsrap-enhanced Lasso

Description

This function implements model-consistent Lasso estimation through the bootstrap. It supports parallel processing by way of the future package, allowing the user to flexibly specify many parallelization methods. This method was developed as a variable-selection algorithm, but this package also supports making ensemble predictions on new data using the bagged Lasso models.

Usage

bolasso(
  formula,
  data,
  n.boot = 100,
  progress = TRUE,
  implement = "glmnet",
  x = NULL,
  y = NULL,
  ...
)

Arguments

formula

An optional object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. Can be omitted when x and y are non-missing.

data

An optional object of class data.frame that contains the modeling variables referenced in form. Can be omitted when x and y are non-missing.

n.boot

An integer specifying the number of bootstrap replicates.

progress

A boolean indicating whether to display progress across bootstrap folds.

implement

A character; either 'glmnet' or 'gamlr', specifying which Lasso implementation to utilize. For specific modeling details, see glmnet::cv.glmnet or gamlr::cv.gamlr.

x

An optional predictor matrix in lieu of form and data.

y

An optional response vector in lieu of form and data.

...

Additional parameters to pass to either glmnet::cv.glmnet or gamlr::cv.gamlr.

Value

An object of class bolasso. This object is a list of length n.boot of cv.glmnet or cv.gamlr objects.

References

Bach FR (2008). “Bolasso: model consistent Lasso estimation through the bootstrap.” CoRR, abs/0804.1302. 0804.1302, https://arxiv.org/abs/0804.1302.

See Also

glmnet::cv.glmnet and gamlr::cv.gamlr for full details on the respective implementations and arguments that can be passed to ....

Examples

mtcars[, c(2, 10:11)] <- lapply(mtcars[, c(2, 10:11)], as.factor)
idx <- sample(nrow(mtcars), 22)
mtcars_train <- mtcars[idx, ]
mtcars_test <- mtcars[-idx, ]

## Formula Interface

# Train model
set.seed(123)
bolasso_form <- bolasso(
  form = mpg ~ .,
  data = mtcars_train,
  n.boot = 20,
  nfolds = 5,
  implement = "glmnet"
)

# Extract selected variables
selected_vars(bolasso_form, threshold = 0.9, select = "lambda.min")

# Bagged ensemble prediction on test data
predict(bolasso_form,
        new.data = mtcars_test,
        select = "lambda.min")

## Alternal Matrix Interface

# Train model
set.seed(123)
bolasso_mat <- bolasso(
  x = model.matrix(mpg ~ . - 1, mtcars_train),
  y = mtcars_train[, 1],
  data = mtcars_train,
  n.boot = 20,
  nfolds = 5,
  implement = "glmnet"
)

# Extract selected variables
selected_vars(bolasso_mat, threshold = 0.9, select = "lambda.min")

# Bagged ensemble prediction on test data
predict(bolasso_mat,
        new.data = model.matrix(mpg ~ . - 1, mtcars_test),
        select = "lambda.min")


[Package bolasso version 0.2.0 Index]