bootLasso {HDCI} | R Documentation |
Bootstrap Lasso
Description
Does residual (or paired) bootstrap Lasso and produces confidence intervals for regression coefficients.
Usage
bootLasso(x, y, B = 500, type.boot = "residual", alpha = 0.05,
cv.method = "cv", nfolds = 10, foldid, cv.OLS = FALSE, tau = 0,
parallel = FALSE, standardize = TRUE, intercept = TRUE,
parallel.boot = FALSE, ncores.boot = 1, ...)
Arguments
x |
Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector. |
y |
Response variable. |
B |
Number of replications in the bootstrap – default is 500. |
type.boot |
Bootstrap method which can take one of the following two values: "residual" or "paired". The default is residual. |
alpha |
Significance level – default is 0.05. |
cv.method |
The method used to select lambda in the Lasso – can be cv, cv1se, and escv; the default is cv. |
nfolds , foldid , cv.OLS , tau , parallel |
Arguments that can be passed to escv.glmnet. |
standardize |
Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE. |
intercept |
Should intercept be fitted (default is TRUE) or set to zero (FALSE). |
parallel.boot |
If TRUE, use parallel foreach to run the bootstrap replication. Must register parallel before hand, such as doParallel or others. See the example below. |
ncores.boot |
Number of cores used in the bootstrap replication. |
... |
Other arguments that can be passed to glmnet. |
Details
The function runs residual (type.boot="residual") or paired (type.boot="paired") bootstrap Lasso procedure, and produces confidence interval for each individual regression coefficient. Note that there are two arguments related to parallel, "parallel" and "parallel.boot": "parallel" is used for parallel foreach in the escv.glmnet; while, "paralle.boot" is used for the parallel foreach in the bootstrap replication precodure.
Value
A list consisting of the following elements is returned.
lambda.opt |
The optimal value of lambda selected by cv/cv1se/escv. |
Beta |
An estimate of the regression coefficients. |
interval |
A 2 by p matrix containing the confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals. |
Examples
library("glmnet")
library("mvtnorm")
## generate the data
set.seed(2015)
n <- 200 # number of obs
p <- 500
s <- 10
beta <- rep(0, p)
beta[1:s] <- runif(s, 1/3, 1)
x <- rmvnorm(n = n, mean = rep(0, p), method = "svd")
signal <- sqrt(mean((x %*% beta)^2))
sigma <- as.numeric(signal / sqrt(10)) # SNR=10
y <- x %*% beta + rnorm(n)
## residual bootstrap Lasso
set.seed(0)
obj <- bootLasso(x = x, y = y, B = 10)
# confidence interval
obj$interval
sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta))
## using parallel in the bootstrap replication
#library("doParallel")
#registerDoParallel(2)
#set.seed(0)
#system.time(obj <- bootLasso(x = x, y = y))
#system.time(obj <- bootLasso(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))