SRanalysis {StabilizedRegression}R Documentation

Stability analysis

Description

Stability analysis based on stabilized regression used to analyze the trade-off between stability and predictivness of individual predictors.

Usage

SRanalysis(
  X,
  Y,
  A,
  num_reps = 100,
  pred_scores = c("mse", "mse_env"),
  prescreen_types = c("correlation", "correlation_env"),
  pars_SR = list(m = ncol(X), B = 100, alpha_stab = 0.05, alpha_pred = 0.05,
    size_weight = "linear", use_resampling = FALSE, prescreen_size = NA, stab_test =
    "exact", variable_importance = "scaled_coefficient"),
  threshold = 0,
  cores = 1,
  verbose = 0,
  seed = NA
)

Arguments

X

predictor matrix. Numeric matrix of size n times d, where columns correspond to individual predictors.

Y

response variable. Numeric vector of length n.

A

stabilizing variable. Numeric vector of length n which can be interpreted as a factor.

num_reps

number of resamples to use in stability selection.

pred_scores

characeter vector of length 2, specifying the pred_score for SR and SRpred.

prescreen_types

characeter vector of length 2, specifying the prescreen_type for SR and SRpred.

pars_SR

list of all remaining parameters going into StabilizedRegression. compute_predictive, pred_score and prescreen_type are ignored.

threshold

numeric value between 0 and 1, specifying in stability selection at which value to select variables.

cores

number of cores used in mclapply.

verbose

0 for no output, 1 for text output and 2 for text and diagnostic plots.

seed

fix the seed value at the beginning of the function.

Details

This function performs two version of StabilizedRegression: SR which selects a stable and predictive model and SRpred which fits a plain predictive model. Stability selection is then performed using the variable importance measures from both these methods and from their difference SRdiff as variable selection criterion. This allows to distinguish between which predictive variables are stable and which are unstable with respect to the stabilizing variable A. The results can be visualized by plotting the resulting object using the plot() function.

Due to the resampling this function can be quite computationally involved, we therefore recommend making use of the cores parameter for parallel computations.

Value

Object of class 'SRanalysis' consisting of the following elements

results

List of stability selection results for for SR, SRpred and SRdiff.

varnames

Vector of variable names taken from the column names of X.

avgcoefsign_SR

Vector of average coefficient signs for SR

avgcoefsign_SRpred

Vector of average coefficient signs for SRpred

Author(s)

Niklas Pfister

References

Pfister, N., E. Williams, R. Aebersold, J. Peters and P. B\"uhlmann (2019). Stabilizing Variable Selection and Regression. arXiv preprint arXiv:1911.01850.

Examples

## Example
set.seed(1)
X1 <- rnorm(200)
Y <- X1 + rnorm(200)
X2 <- 0.5 * X1 + Y + 0.2 * c(rnorm(100), rnorm(100)+3)

X <- cbind(X1, X2)
A <- as.factor(rep(c(0, 1), each=100))

obj <- SRanalysis(X, Y, A, 10,
                  pars_SR=list(B=NA))
plot(obj, varnames = c("X1", "X2"), labels=TRUE)
print(obj$results)

[Package StabilizedRegression version 1.1 Index]