bootLasso {HDCI}R Documentation

Bootstrap Lasso

Description

Does residual (or paired) bootstrap Lasso and produces confidence intervals for regression coefficients.

Usage

bootLasso(x, y, B = 500, type.boot = "residual", alpha = 0.05, 
          cv.method = "cv", nfolds = 10, foldid, cv.OLS = FALSE, tau = 0, 
          parallel = FALSE, standardize = TRUE, intercept = TRUE, 
          parallel.boot = FALSE, ncores.boot = 1, ...)

Arguments

x

Input matrix as in glmnet, of dimension nobs x nvars; each row is an observation vector.

y

Response variable.

B

Number of replications in the bootstrap – default is 500.

type.boot

Bootstrap method which can take one of the following two values: "residual" or "paired". The default is residual.

alpha

Significance level – default is 0.05.

cv.method

The method used to select lambda in the Lasso – can be cv, cv1se, and escv; the default is cv.

nfolds, foldid, cv.OLS, tau, parallel

Arguments that can be passed to escv.glmnet.

standardize

Logical flag for x variable standardization, prior to fitting the model. Default is standardize=TRUE.

intercept

Should intercept be fitted (default is TRUE) or set to zero (FALSE).

parallel.boot

If TRUE, use parallel foreach to run the bootstrap replication. Must register parallel before hand, such as doParallel or others. See the example below.

ncores.boot

Number of cores used in the bootstrap replication.

...

Other arguments that can be passed to glmnet.

Details

The function runs residual (type.boot="residual") or paired (type.boot="paired") bootstrap Lasso procedure, and produces confidence interval for each individual regression coefficient. Note that there are two arguments related to parallel, "parallel" and "parallel.boot": "parallel" is used for parallel foreach in the escv.glmnet; while, "paralle.boot" is used for the parallel foreach in the bootstrap replication precodure.

Value

A list consisting of the following elements is returned.

lambda.opt

The optimal value of lambda selected by cv/cv1se/escv.

Beta

An estimate of the regression coefficients.

interval

A 2 by p matrix containing the confidence intervals – the first row is the lower bounds of the confidence intervals for each of the coefficients and the second row is the upper bounds of the confidence intervals.

Examples

library("glmnet")
library("mvtnorm") 

## generate the data
set.seed(2015)
n <- 200      # number of obs
p <- 500
s <- 10
beta <- rep(0, p)
beta[1:s] <- runif(s, 1/3, 1)
x <- rmvnorm(n = n, mean = rep(0, p), method = "svd")
signal <- sqrt(mean((x %*% beta)^2))
sigma <- as.numeric(signal / sqrt(10))  # SNR=10
y <- x %*% beta + rnorm(n)

## residual bootstrap Lasso
set.seed(0)
obj <- bootLasso(x = x, y = y, B = 10)
# confidence interval
obj$interval
sum((obj$interval[1,]<=beta) & (obj$interval[2,]>=beta))

## using parallel in the bootstrap replication
#library("doParallel")
#registerDoParallel(2)
#set.seed(0)
#system.time(obj <- bootLasso(x = x, y = y))
#system.time(obj <- bootLasso(x = x, y = y, parallel.boot = TRUE, ncores.boot = 2))


[Package HDCI version 1.0-2 Index]