SRanalysis {StabilizedRegression} | R Documentation |
Stability analysis
Description
Stability analysis based on stabilized regression used to analyze the trade-off between stability and predictivness of individual predictors.
Usage
SRanalysis(
X,
Y,
A,
num_reps = 100,
pred_scores = c("mse", "mse_env"),
prescreen_types = c("correlation", "correlation_env"),
pars_SR = list(m = ncol(X), B = 100, alpha_stab = 0.05, alpha_pred = 0.05,
size_weight = "linear", use_resampling = FALSE, prescreen_size = NA, stab_test =
"exact", variable_importance = "scaled_coefficient"),
threshold = 0,
cores = 1,
verbose = 0,
seed = NA
)
Arguments
X |
predictor matrix. Numeric matrix of size n times d, where columns correspond to individual predictors. |
Y |
response variable. Numeric vector of length n. |
A |
stabilizing variable. Numeric vector of length n which can be interpreted as a factor. |
num_reps |
number of resamples to use in stability selection. |
pred_scores |
characeter vector of length 2, specifying the
|
prescreen_types |
characeter vector of length 2, specifying
the |
pars_SR |
list of all remaining parameters going into
StabilizedRegression. |
threshold |
numeric value between 0 and 1, specifying in stability selection at which value to select variables. |
cores |
number of cores used in mclapply. |
verbose |
0 for no output, 1 for text output and 2 for text and diagnostic plots. |
seed |
fix the seed value at the beginning of the function. |
Details
This function performs two version of StabilizedRegression: SR which selects a stable and predictive model and SRpred which fits a plain predictive model. Stability selection is then performed using the variable importance measures from both these methods and from their difference SRdiff as variable selection criterion. This allows to distinguish between which predictive variables are stable and which are unstable with respect to the stabilizing variable A. The results can be visualized by plotting the resulting object using the plot() function.
Due to the resampling this function can be quite computationally
involved, we therefore recommend making use of the cores
parameter for parallel computations.
Value
Object of class 'SRanalysis' consisting of the following elements
results |
List of stability selection results for for SR, SRpred and SRdiff. |
varnames |
Vector of variable names taken from the column names of X. |
avgcoefsign_SR |
Vector of average coefficient signs for SR |
avgcoefsign_SRpred |
Vector of average coefficient signs for SRpred |
Author(s)
Niklas Pfister
References
Pfister, N., E. Williams, R. Aebersold, J. Peters and P. B\"uhlmann (2019). Stabilizing Variable Selection and Regression. arXiv preprint arXiv:1911.01850.
Examples
## Example
set.seed(1)
X1 <- rnorm(200)
Y <- X1 + rnorm(200)
X2 <- 0.5 * X1 + Y + 0.2 * c(rnorm(100), rnorm(100)+3)
X <- cbind(X1, X2)
A <- as.factor(rep(c(0, 1), each=100))
obj <- SRanalysis(X, Y, A, 10,
pars_SR=list(B=NA))
plot(obj, varnames = c("X1", "X2"), labels=TRUE)
print(obj$results)