R: fit a lasso regression and use standard BIC for variable...

lasso_bic {adapt4pv}

R Documentation

fit a lasso regression and use standard BIC for variable selection

Description

Fit a lasso regression and use the Bayesian Information Criterion (BIC) to select a subset of selected covariates. Can deal with very large sparse data matrices. Intended for binary reponse only (option family = "binomial" is forced). Depends on the glmnet and relax.glmnet functions from the package glmnet.

Usage

lasso_bic(x, y, maxp = 50, path = TRUE, betaPos = TRUE, ...)

Arguments

`x`	Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class `"sparseMatrix"` as in package `Matrix`).
`y`	Binary response variable, numeric.
`maxp`	A limit on how many relaxed coefficients are allowed. Default is 50, in `glmnet` option default is 'n-3', where 'n' is the sample size.
`path`	Since `glmnet` does not do stepsize optimization, the Newton algorithm can get stuck and not converge, especially with relaxed fits. With `path=TRUE`, each relaxed fit on a particular set of variables is computed pathwise using the original sequence of lambda values (with a zero attached to the end). Default is `path=TRUE`.
`betaPos`	Should the covariates selected by the procedure be positively associated with the outcome ? Default is `TRUE`.
`...`	Other arguments that can be passed to `glmnet` from package `glmnet` other than `family`, `maxp` and `path`.

Details

For each tested penalisation parameter \lambda, a standard version of the BIC is implemented.

BIC_\lambda = - 2 l_\lambda + df(\lambda) * ln (N)

where l_\lambda is the log-likelihood of the non-penalized multiple logistic regression model that includes the set of covariates with a non-zero coefficient in the penalised regression coefficient vector associated to \lambda, and df(\lambda) is the number of covariates with a non-zero coefficient in the penalised regression coefficient vector associated to \lambda, The optimal set of covariates according to this approach is the one associated with the classical multiple logistic regression model which minimizes the BIC.

Value

An object with S3 class "log.lasso".

beta

Numeric vector of regression coefficients in the lasso. In lasso_bic function, the regression coefficients are UNPENALIZED. Length equal to nvars.

selected_variables

Character vector, names of variable(s) selected with the lasso-bic approach. If betaPos = TRUE, this set is the covariates with a positive regression coefficient in beta. Else this set is the covariates with a non null regression coefficient in beta. Covariates are ordering according to the p-values (two-sided if betaPos = FALSE , one-sided if betaPos = TRUE) in the classical multiple logistic regression model that minimzes the BIC.

Author(s)

Emeline Courtois
Maintainer: Emeline Courtois emeline.courtois@inserm.fr

Examples


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
lb <- lasso_bic(x = drugs, y = ae, maxp = 20)

[Package adapt4pv version 0.2-3 Index]