## Class Imbalanced Subsampling Lasso

### Description

Implementation of CISL and the stability selection according to subsampling options.

### Usage

cisl(
x,
y,
r = 4,
nB = 100,
dfmax = 50,
nlambda = 250,
nMin = 0,
replace = TRUE,
betaPos = TRUE,
ncore = 1
)


### Arguments

 x Input matrix, of dimension nobs x nvars. Each row is an observation vector. Can be in sparse matrix format (inherit from class "sparseMatrix" as in package Matrix). y Binary response variable, numeric. r Number of control in the CISL sampling. Default is 4. See details below for other implementations. nB Number of sub-samples. Default is 100. dfmax Corresponds to the maximum size of the models visited with the lasso (E in the paper). Default is 50. nlambda Number of lambda values as is glmnet documentation. Default is 250. nMin Minimum number of events for a covariate to be considered. Default is 0, all the covariates from x are considered. replace Should sampling be with replacement? Default is TRUE. betaPos If betaPos=TRUE, variable selection is based on positive regression coefficient. Else, variable selection is based on non-zero regression coefficient. Default is TRUE. ncore The number of calcul units used for parallel computing. This has to be set to 1 if the parallel package is not available. Default is 1. WARNING: parallel computing is not supported for windows machines!

### Details

CISL is a variation of the stability method adapted to characteristics of pharmacovigilance databases. Tunning r = 4 and replace = TRUE are used to implement our CISL sampling. For instance, r = NULL and replace = FALSE can be used to implement the n \over 2 sampling in Stability Selection.

### Value

An object with S3 class "cisl".

 prob Matrix of dimension nvars x nB. Quantity compute by CISL for each covariate, for each subsample. q05 5 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars. q10 10 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars. q15 15 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars. q20 20 \% quantile of the CISL quantity for each covariates. Numeric, length equal to nvars.

Ismail Ahmed

### References

Ahmed, I., Pariente, A., & Tubert-Bitter, P. (2018). "Class-imbalanced subsampling lasso algorithm for discovering adverse drug reactions". Statistical Methods in Medical Research. 27(3), 785–797, doi: 10.1177/0962280216643116

### Examples


set.seed(15)
drugs <- matrix(rbinom(100*20, 1, 0.2), nrow = 100, ncol = 20)
colnames(drugs) <- paste0("drugs",1:ncol(drugs))
ae <- rbinom(100, 1, 0.3)
lcisl <- cisl(x = drugs, y = ae, nB = 50)