R: fit an adaptive lasso with adaptive weights derived from...

adapt_univ {adapt4pv}

R Documentation

fit an adaptive lasso with adaptive weights derived from univariate coefficients

Description

Compute odd-ratios between each covariate of x and y then derived adaptive weights to incorporate in an adaptive lasso. BIC or cross-validation could either be used for the adaptive lasso for variable selection. Two options for implementing cross-validation for the adaptive lasso are possible through the type_cv parameter (see bellow). Can deal with very large sparse data matrices. Intended for binary reponse only (option family = "binomial" is forced). The cross-validation criterion used is deviance. Depends on the glmnet and relax.glmnet function from the package glmnet.

Usage

adapt_univ(
  x,
  y,
  gamma = 1,
  criterion = "bic",
  maxp = 50,
  path = TRUE,
  nfolds = 5,
  foldid = NULL,
  type_cv = "proper",
  betaPos = TRUE,
  ...
)

Arguments

`x`	Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class `"sparseMatrix"` as in package `Matrix`).
`y`	Binary response variable, numeric.
`gamma`	Tunning parameter to defined the penalty weights. See details below. Default is set to 1.
`criterion`	Character, indicates which criterion is used with the adaptive lasso for variable selection. Could be either "bic" or "cv". Default is "bic"
`maxp`	Used only if `criterion` = "bic", ignored if `criterion` = "cv". A limit on how many relaxed coefficients are allowed. Default is 50, in `glmnet` option default is 'n-3', where 'n' is the sample size.
`path`	Used only if `criterion` = "bic", ignored if `criterion` = "cv". Since `glmnet` does not do stepsize optimization, the Newton algorithm can get stuck and not converge, especially with relaxed fits. With `path=TRUE`, each relaxed fit on a particular set of variables is computed pathwise using the original sequence of lambda values (with a zero attached to the end). Default is `path=TRUE`.
`nfolds`	Used only if `criterion` = "cv", ignored if `criterion` = "bic". Number of folds - default is 5. Although `nfolds` can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is `nfolds=3`.
`foldid`	Used only if `criterion` = "cv", ignored if `criterion` = "bic". An optional vector of values between 1 and `nfolds` identifying what fold each observation is in. If supplied, `nfolds` can be missing.
`type_cv`	Used only if `criterion` = "cv", ignored if `criterion` = "bic". Character, indicates which implementation of cross-validation is performed for the adaptive lasso: a "naive" one, where adaptive weights obtained on the full data are used, and a "proper" one, where adaptive weights are calculated for each training sets. Could be either "naive" or "proper". Default is "proper".
`betaPos`	Should the covariates selected by the procedure be positively associated with the outcome ? Default is `TRUE`.
`...`	Other arguments that can be passed to `glmnet` from package `glmnet` other than `family`, `maxp`, `standardize`, `intercept`

Details

The adaptive weight for a given covariate i is defined by

w_i = 1/|\beta^{univ}_i|^\gamma

where \beta^{univ}_i = log(OR_i), with OR_i is the odd-ratio associated to covariate i with the outcome.

Value

An object with S3 class "adaptive".

`aws`	Numeric vector of penalty weights derived from odds-ratios. Length equal to nvars.
`criterion`	Character, same as input. Could be either "bic" or "cv".
`beta`	Numeric vector of regression coefficients in the adaptive lasso. If `criterion` = "cv" the regression coefficients are PENALIZED, if `criterion` = "bic" the regression coefficients are UNPENALIZED. Length equal to nvars. Could be NA if adaptive weights are all equal to infinity.
`selected_variables`	Character vector, names of variable(s) selected with this adaptive approach. If `betaPos = TRUE`, this set is the covariates with a positive regression coefficient in `beta`. Else this set is the covariates with a non null regression coefficient in `beta`. If `criterion` = "bic", covariates are ordering according to magnitude of their regression coefficients absolute value in the adaptive lasso. If `criterion` = "bic", covariates are ordering according to the p-values (two-sided if `betaPos = FALSE` , one-sided if `betaPos = TRUE`) in the classical multiple logistic regression model that minimzes the BIC in the adaptive lasso.

Author(s)

Emeline Courtois
Maintainer: Emeline Courtois emeline.courtois@inserm.fr

Examples


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
au <- adapt_univ(x = drugs, y = ae, criterion ="cv", nfolds = 3)

[Package adapt4pv version 0.2-3 Index]