R: Stability score

StabilityScore {sharp}

R Documentation

Stability score

Description

Computes the stability score from selection proportions of models with a given parameter controlling the sparsity and for different thresholds in selection proportions. The score measures how unlikely it is that the selection procedure is uniform (i.e. uninformative) for a given combination of parameters.

Usage

StabilityScore(
  selprop,
  pi_list = seq(0.6, 0.9, by = 0.01),
  K,
  n_cat = 3,
  group = NULL
)

Arguments

`selprop`	array of selection proportions.
`pi_list`	vector of thresholds in selection proportions. If `n_cat=NULL` or `n_cat=2`, these values must be `>0` and `<1`. If `n_cat=3`, these values must be `>0.5` and `<1`.
`K`	number of resampling iterations.
`n_cat`	computation options for the stability score. Default is `NULL` to use the score based on a z test. Other possible values are 2 or 3 to use the score based on the negative log-likelihood.
`group`	vector encoding the grouping structure among predictors. This argument indicates the number of variables in each group and only needs to be provided for group (but not sparse group) penalisation.

Details

The stability score is derived from the likelihood under the assumption of uniform (uninformative) selection.

We classify the features into three categories: the stably selected ones (that have selection proportions \ge \pi), the stably excluded ones (selection proportion \le 1-\pi), and the unstable ones (selection proportions between 1-\pi and \pi).

Under the hypothesis of equiprobability of selection (instability), the likelihood of observing stably selected, stably excluded and unstable features can be expressed as:

L_{\lambda, \pi} = \prod_{j=1}^N [ ( 1 - F( K \pi - 1 ) )^{1_{H_{\lambda} (j) \ge K \pi}} \times ( F( K \pi - 1 ) - F( K ( 1 - \pi ) )^{1_{ (1-\pi) K < H_{\lambda} (j) < K \pi }} \times F( K ( 1 - \pi ) )^{1_{ H_{\lambda} (j) \le K (1-\pi) }} ]

where H_{\lambda} (j) is the selection count of feature j and F(x) is the cumulative probability function of the binomial distribution with parameters K and the average proportion of selected features over resampling iterations.

The stability score is computed as the minus log-transformed likelihood under the assumption of equiprobability of selection:

S_{\lambda, \pi} = -log(L_{\lambda, \pi})

The stability score increases with stability.

Alternatively, the stability score can be computed by considering only two sets of features: stably selected (selection proportions \ge \pi) or not (selection proportions < \pi). This can be done using n_cat=2.

Value

A vector of stability scores obtained with the different thresholds in selection proportions.

References

Bodinier B, Filippi S, Nøst TH, Chiquet J, Chadeau-Hyam M (2023). “Automated calibration for stability selection in penalised regression and graphical models.” Journal of the Royal Statistical Society Series C: Applied Statistics, qlad058. ISSN 0035-9254, doi:10.1093/jrsssc/qlad058, https://academic.oup.com/jrsssc/advance-article-pdf/doi/10.1093/jrsssc/qlad058/50878777/qlad058.pdf.

Examples

# Simulating set of selection proportions
set.seed(1)
selprop <- round(runif(n = 20), digits = 2)

# Computing stability scores for different thresholds
score <- StabilityScore(selprop, pi_list = c(0.6, 0.7, 0.8), K = 100)

[Package sharp version 1.4.6 Index]