cisl {adapt4pv}R Documentation

Class Imbalanced Subsampling Lasso

Description

Implementation of CISL and the stability selection according to subsampling options.

Usage

cisl(
  x,
  y,
  r = 4,
  nB = 100,
  dfmax = 50,
  nlambda = 250,
  nMin = 0,
  replace = TRUE,
  betaPos = TRUE,
  ncore = 1
)

Arguments

x

Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix).

y

Binary response variable, numeric.

r

Number of control in the CISL sampling. Default is 4. See details below for other implementations.

nB

Number of sub-samples. Default is 100.

dfmax

Corresponds to the maximum size of the models visited with the lasso (E in the paper). Default is 50.

nlambda

Number of lambda values as is glmnet documentation. Default is 250.

nMin

Minimum number of events for a covariate to be considered. Default is 0, all the covariates from x are considered.

replace

Should sampling be with replacement? Default is TRUE.

betaPos

If betaPos=TRUE, variable selection is based on positive regression coefficient. Else, variable selection is based on non-zero regression coefficient. Default is TRUE.

ncore

The number of calcul units used for parallel computing. This has to be set to 1 if the parallel package is not available. Default is 1. WARNING: parallel computing is not supported for windows machines!

Details

CISL is a variation of the stability method adapted to characteristics of pharmacovigilance databases. Tunning r = 4 and replace = TRUE are used to implement our CISL sampling. For instance, r = NULL and replace = FALSE can be used to implement the n \over 2 sampling in Stability Selection.

Value

An object with S3 class "cisl".

prob

Matrix of dimension nvars x nB. Quantity compute by CISL for each covariate, for each subsample.

q05

5 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars.

q10

10 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars.

q15

15 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars.

q20

20 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars.

Author(s)

Ismail Ahmed

References

Ahmed, I., Pariente, A., & Tubert-Bitter, P. (2018). "Class-imbalanced subsampling lasso algorithm for discovering adverse drug reactions". Statistical Methods in Medical Research. 27(3), 785–797, doi:10.1177/0962280216643116

Examples


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
lcisl <- cisl(x = drugs, y = ae, nB = 50)


[Package adapt4pv version 0.2-3 Index]