R: fit an adaptive lasso with adaptive weights derived from CISL

adapt_cisl {adapt4pv}

R Documentation

fit an adaptive lasso with adaptive weights derived from CISL

Description

Compute the CISL procedure (see cisl for more details) to determine adaptive penalty weights, then run an adaptive lasso with this penalty weighting. BIC is used for the adaptive lasso for variable selection. Can deal with very large sparse data matrices. Intended for binary reponse only (option family = "binomial" is forced). Depends on the glmnet function from the package glmnet.

Usage

adapt_cisl(
  x,
  y,
  cisl_nB = 100,
  cisl_dfmax = 50,
  cisl_nlambda = 250,
  cisl_ncore = 1,
  maxp = 50,
  path = TRUE,
  betaPos = TRUE,
  ...
)

Arguments

`x`	Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class `"sparseMatrix"` as in package `Matrix`).
`y`	Binary response variable, numeric.
`cisl_nB`	`nB` option in `cisl` function. Default is 100.
`cisl_dfmax`	`dfmax` option in `cisl` function. Default is 50.
`cisl_nlambda`	`nlambda` option in `cisl` function. Default is 250.
`cisl_ncore`	`ncore` option in `cisl` function. Default is 1.
`maxp`	A limit on how many relaxed coefficients are allowed. Default is 50, in `glmnet` option default is 'n-3', where 'n' is the sample size.
`path`	Since `glmnet` does not do stepsize optimization, the Newton algorithm can get stuck and not converge, especially with relaxed fits. With `path=TRUE`, each relaxed fit on a particular set of variables is computed pathwise using the original sequence of lambda values (with a zero attached to the end). Default is `path=TRUE`.
`betaPos`	Should the covariates selected by the procedure be positively associated with the outcome ? Default is `TRUE`.
`...`	Other arguments that can be passed to `glmnet` from package `glmnet` other than `penalty.factor`, `family`, `maxp` and `path`.

Details

The CISL procedureis first implemented with its default value except for dfmax and nlambda through parameters cisl_dfmax and cisl_nlambda. In addition, the betaPos parameter is set to FALSE in cisl. For each covariate i, cisl_nB values of the CISL quantity \tau_i are estimated. The adaptive weight for a given covariate i is defined by

w_i = 1- 1/cisl_nB \sum_{b=1, .., cisl_nB} 1 [ \tau^b_i >0 ]

If \tau_i is the null vector, the associated adaptve weights in infinty. If \tau_i is always positive, rather than "forcing" the variable into the model, we set the corresponding adaptive weight to 1/cisl_nB.

Value

An object with S3 class "adaptive".

`aws`	Numeric vector of penalty weights derived from CISL. Length equal to nvars.
`criterion`	Character, indicates which criterion is used with the adaptive lasso for variable selection. For `adapt_cisl` function, `criterion` is "bic".
`beta`	Numeric vector of regression coefficients in the adaptive lasso. If `criterion` = "cv" the regression coefficients are PENALIZED, if `criterion` = "bic" the regression coefficients are UNPENALIZED. Length equal to nvars. Could be NA if adaptive weights are all equal to infinity.
`selected_variables`	Character vector, names of variable(s) selected with this adaptive approach. If `betaPos = TRUE`, this set is the covariates with a positive regression coefficient in `beta`. Else this set is the covariates with a non null regression coefficient in `beta`. Covariates are ordering according to the p-values (two-sided if `betaPos = FALSE` , one-sided if `betaPos = TRUE`) in the classical multiple logistic regression model that minimzes the BIC in the adaptive lasso.

Author(s)

Emeline Courtois
Maintainer: Emeline Courtois emeline.courtois@inserm.fr

Examples


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
acisl <- adapt_cisl(x = drugs, y = ae, cisl_nB = 50, maxp=10)

[Package adapt4pv version 0.2-3 Index]