glmnetSE {glmnetSE}R Documentation

Add Nonparametric Bootstrap SE to 'glmnet' for Selected Coefficients (No Shrinkage)

Description

Builds a LASSO, Ridge, or Elastic Net model with glmnet or cv.glmnet with bootstrap inference statistics (SE, CI, and p-value) for selected coefficients with no shrinkage applied for them. Model performance can be evaluated on test data and an automated alpha selection is implemented for Elastic Net. Parallelized computation is used to speed up the process.

Usage

glmnetSE(
  data,
  cf.no.shrnkg,
  alpha = 1,
  method = "10CVoneSE",
  test = "none",
  r = 250,
  nlambda = 100,
  seed = 0,
  family = "gaussian",
  type = "basic",
  conf = 0.95,
  perf.metric = "mse",
  ncore = "mx.core"
)

Arguments

data

A data frame, tibble, or matrix object with the outcome variable in the first column and the feature variables in the following columns. Note: all columns beside the first one are used as feature variables. Feature selection has to be done beforehand.

cf.no.shrnkg

A character string of the coefficients whose effect size will be interpreted, the inference statistic is of interest and therefore no shrinkage will be applied.

alpha

Alpha value [0,1]. An alpha of 0 results in a ridge regression, a value of 1 in a LASSO, and a value between 0 and 1 in an Elastic Net. If a sequence of possible alphas is passed to the alpha argument the alpha of the best performing model (based on the selected method and perf.metric) is selected - default is 1.

method

A character string defining if 10-fold cross-validation is used or not. Possible methods are none: no cross-validation is applied and the coefficients for lambda = 0.1 are selected. 10CVoneSE : 10-fold cross-validation is applied and the lambda of the least complex model with an MSE within one standard error of the smallest MSE is selected. 10CVmin: 10-fold cross-validation is applied and the lambda at which the MSE is the smallest is selected - default is 10CVoneSE.

test

A data frame, tibble, or matrix object with the same outcome and feature variables as supplied to data which includes test-observations not used for the training of the model.

r

Number of nonparametric bootstrap repetitions - default is 250

nlambda

Number of tested lambda values - default is 100.

seed

Seed set for the cross-validation and bootstrap sampling - default 0 which means no seed set.

family

A character string representing the used model family either gaussian or binomial - default is gaussian.

type

A character string indicating the type of calculated bootstrap intervals. It can be norm, basic, perc, or bca. For more information check the boot.ci package - default is basic.

conf

Indicates the confidence interval level - default is 0.95.

perf.metric

A character string indicating the used performance metric to evaluate the performance of different lambdas and the final model. Can be either mse (mean squared error), mae (mean absolute error), class (classification error), or auc (area under the curve). Is not applied when method none is used - default is mse.

ncore

A numerical value indicates the number of build clusters and used cores in the computation. If not defined the maximum available number of cores of the OS -1 is used mx.core. It is not possible to use more than 32 cores, because efficiency decreases rapidly at this point see (Sloan et al. 2014) - default is mx.core.

Value

glmnetSE object which output can be displayed using summary() or summary.glmnetSE(). If family binomial and performance metric auc is used it is possible to plot the ROC curve with plot() or plot.glmnetSE().

Author(s)

Sebastian Bahr, sebastian.bahr@unibe.ch

References

Friedman J., Hastie T. and Tibshirani R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. https://www.jstatsoft.org/v33/i01/.

Simon N., Friedman J., Hastie T. and Tibshirani R. (2011). Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. Journal of Statistical Software, 39(5), 1-13. https://www.jstatsoft.org/v39/i05/.

Efron, B. and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman & Hall. https://cds.cern.ch/record/526679/files/0412042312_TOC.pdf

Sloan T.M., Piotrowski M., Forster T. and Ghazal P. (2014) Parallel Optimization of Bootstrapping in R. https://arxiv.org/ftp/arxiv/papers/1401/1401.6389.pdf

See Also

summary.glmnetSE and plot.glmnetSE methods.

Examples


# LASSO model with gaussian function, no cross validation, a seed of 123, and
# the coefficient of interest is Education. Two cores are used for the computation

glmnetSE(data=swiss, cf.no.shrnkg = c("Education"), alpha=1, method="none", seed = 123, ncore = 2)


# Ridge model with binomial function, 10-fold cross validation selecting the lambda
# at which the smallest MSE is achieved, 500 bootstrap repetitions, no seed, the
# misclassification error is used as performance metric, and the coefficient of
# interest are Education and Catholic. Two cores are used for the computation.

# Generate dichotom variable
swiss$Fertility <- ifelse(swiss$Fertility >= median(swiss$Fertility), 1, 0)

glmnetSE(data=swiss, cf.no.shrnkg = c("Education", "Catholic"), alpha=0, method="10CVmin", r=500,
         seed = 0, family="binomial", perf.metric = "class", ncore = 2)


# Elastic Net with gaussian function, automated alpha selection, selection the lambda
# within one standard deviation of the best model, test data to obtain the performance
# metric on it, a seed of 123, bias-corrected and accelerated confidence intervals, a
# level of 0.9, the performance metric MAE, and the coefficient of interest is Education.
# Two cores are used for the computation

# Generate a train and test set
set.seed(123)
train_sample <- sample(nrow(swiss), 0.8*nrow(swiss))

swiss.train <- swiss[train_sample, ]
swiss.test  <- swiss[-train_sample, ]

glmnetSE(data=swiss.train, cf.no.shrnkg = c("Education"), alpha=seq(0.1,0.9,0.1),
method="10CVoneSE", test = swiss.test, seed = 123, family = "gaussian", type = "bca",
conf = 0.9, perf.metric = "mae", ncore = 2)


[Package glmnetSE version 0.0.1 Index]