stable.clr.g {penalizedclr} | R Documentation |
Stability selection based on penalized conditional logistic regression
Description
Performs stability selection for conditional logistic regression models with L1 and L2 penalty allowing for different penalties for different blocks (groups) of covariates (different data sources).
Usage
stable.clr.g(
response,
stratum,
penalized,
unpenalized = NULL,
p = NULL,
lambda.list,
alpha = 1,
B = 100,
parallel = TRUE,
standardize = TRUE,
event
)
Arguments
response |
The response variable, either a 0/1 vector or a factor with two levels. |
stratum |
A numeric vector with stratum membership of each observation. |
penalized |
A matrix of penalized covariates. |
unpenalized |
A matrix of additional unpenalized covariates. |
p |
The sizes of blocks of covariates, a numerical vector of the length equal to the number of blocks, and with the sum equal to the number of penalized covariates. If missing, all covariates are treated the same and a single penalty is applied. |
lambda.list |
A list of vectors of penalties to be applied to different blocks of covariates. Each vector should have the length equal to the number of blocks. |
alpha |
The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive. |
B |
A single positive number for the number of subsamples. |
parallel |
Logical. Should the computation be parallelized? |
standardize |
Should the covariates be standardized, a logical value. |
event |
If response is a factor, the level that should be considered a success in the logistic regression. |
Details
This function implements stability selection (Meinshausen and Bühlmann, 2010) in
a conditional logistic regression. The implementation is based on the modification of Shah and
Samworth (2013) featuring complementary subsamples. Note that this means that the number
of subsamples will be 2B
instead of B
. Subsampling procedure is repeated
2B
times for each vector of per-block penalties resulting each time in a vector of
selection frequencies (frequency of non-zero coefficient estimate of each covariate).
The final selection probability Pistab
is obtained by taking the maximum over
all considered vectors of penalties.
Value
A list containing a numeric vector Pistab
,
giving selection probabilities for all penalized covariates,
lambda.list
and p
provided as input arguments.
References
Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.
Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 55-80.
Examples
set.seed(123)
# simulate covariates (pure noise in two blocks of 20 and 80 variables)
X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80))
p <- c(20,80)
# stratum membership
stratum <- sort(rep(1:100, 2))
# the response
Y <- rep(c(1, 0), 100)
# list of L1 penalties
lambda.list = list(c(0.5,1), c(2,0.9))
# perform stability selection
stable.g1 <- stable.clr.g(response = Y, penalized = X, stratum = stratum,
p = p, lambda.list = lambda.list)