R: Stability selection metrics

StabilityMetrics {sharp}

R Documentation

Stability selection metrics

Description

Computes the stability score (see StabilityScore) and upper-bounds of the PFER and FDP from selection proportions of models with a given parameter controlling the sparsity of the underlying algorithm and for different thresholds in selection proportions.

Usage

StabilityMetrics(
  selprop,
  pk = NULL,
  pi_list = seq(0.6, 0.9, by = 0.01),
  K = 100,
  n_cat = NULL,
  PFER_method = "MB",
  PFER_thr_blocks = Inf,
  FDP_thr_blocks = Inf,
  Sequential_template = NULL,
  graph = TRUE,
  group = NULL
)

Arguments

`selprop`	array of selection proportions.
`pk`	optional vector encoding the grouping structure. Only used for multi-block stability selection where `pk` indicates the number of variables in each group. If `pk=NULL`, single-block stability selection is performed.
`pi_list`	vector of thresholds in selection proportions. If `n_cat=NULL` or `n_cat=2`, these values must be `>0` and `<1`. If `n_cat=3`, these values must be `>0.5` and `<1`.
`K`	number of resampling iterations.
`n_cat`	computation options for the stability score. Default is `NULL` to use the score based on a z test. Other possible values are 2 or 3 to use the score based on the negative log-likelihood.
`PFER_method`	method used to compute the upper-bound of the expected number of False Positives (or Per Family Error Rate, PFER). If `PFER_method="MB"`, the method proposed by Meinshausen and Bühlmann (2010) is used. If `PFER_method="SS"`, the method proposed by Shah and Samworth (2013) under the assumption of unimodality is used.
`PFER_thr_blocks`	vector of block-specific thresholds in PFER for constrained calibration by error control. If `PFER_thr=Inf` and `FDP_thr=Inf`, unconstrained calibration is used.
`FDP_thr_blocks`	vector of block-specific thresholds in the expected proportion of falsely selected features (or False Discovery Proportion, FDP) for constrained calibration by error control. If `PFER_thr=Inf` and `FDP_thr=Inf`, unconstrained calibration is used.
`Sequential_template`	logical matrix encoding the type of procedure to use for data with multiple blocks in stability selection graphical modelling. For multi-block estimation, the stability selection model is constructed as the union of block-specific stable edges estimated while the others are weakly penalised (`TRUE` only for the block currently being calibrated and `FALSE` for other blocks). Other approaches with joint calibration of the blocks are allowed (all entries are set to `TRUE`).
`graph`	logical indicating if stability selection is performed in a regression (if `FALSE`) or graphical (if `TRUE`) framework.
`group`	vector encoding the grouping structure among predictors. This argument indicates the number of variables in each group and only needs to be provided for group (but not sparse group) penalisation.

Value

A list with:

`S`	a matrix of the best (block-specific) stability scores for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.
`Lambda`	a matrix of (block-specific) penalty parameters. In multi-block stability selection, rows correspond to sets of penalty parameters and columns correspond to different blocks.
`Q`	a matrix of average numbers of (block-specific) edges selected by the underlying algorithm for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.
`Q_s`	a matrix of calibrated numbers of (block-specific) stable edges for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.
`P`	a matrix of calibrated (block-specific) thresholds in selection proportions for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.
`PFER`	a matrix of computed (block-specific) upper-bounds in PFER of calibrated graphs for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.
`FDP`	a matrix of computed (block-specific) upper-bounds in FDP of calibrated stability selection models for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.
`S_2d`	an array of (block-specific) stability scores obtained with different combinations of parameters. Rows correspond to different (sets of) penalty parameters and columns correspond to different thresholds in selection proportions. In multi-block stability selection, indices along the third dimension correspond to different blocks.
`PFER_2d`	an array of computed upper-bounds of PFER obtained with different combinations of parameters. Rows correspond to different penalty parameters and columns correspond to different thresholds in selection proportions. Not available in multi-block stability selection graphical modelling.
`FDP_2d`	an array of computed upper-bounds of FDP obtained with different combinations of parameters. Rows correspond to different penalty parameters and columns correspond to different thresholds in selection proportions. Not available in multi-block stability selection graphical modelling.

References

Bodinier B, Filippi S, Nøst TH, Chiquet J, Chadeau-Hyam M (2023). “Automated calibration for stability selection in penalised regression and graphical models.” Journal of the Royal Statistical Society Series C: Applied Statistics, qlad058. ISSN 0035-9254, doi:10.1093/jrsssc/qlad058, https://academic.oup.com/jrsssc/advance-article-pdf/doi/10.1093/jrsssc/qlad058/50878777/qlad058.pdf.

Meinshausen N, Bühlmann P (2010). “Stability selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473. doi:10.1111/j.1467-9868.2010.00740.x.

Shah RD, Samworth RJ (2013). “Variable selection with error control: another look at stability selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 55-80. doi:10.1111/j.1467-9868.2011.01034.x.

Examples

## Sparse or sparse group penalisation

# Simulating set of selection proportions
set.seed(1)
selprop <- matrix(round(runif(n = 20), digits = 2), nrow = 2)

# Computing stability scores for different thresholds
metrics <- StabilityMetrics(
  selprop = selprop, pi = c(0.6, 0.7, 0.8),
  K = 100, graph = FALSE
)


## Group penalisation

# Simulating set of selection proportions
set.seed(1)
selprop <- matrix(round(runif(n = 6), digits = 2), nrow = 2)
selprop <- cbind(
  selprop[, 1], selprop[, 1],
  selprop[, 2], selprop[, 2],
  matrix(rep(selprop[, 3], each = 6), nrow = 2, byrow = TRUE)
)

# Computing stability scores for different thresholds
metrics <- StabilityMetrics(
  selprop = selprop, pi = c(0.6, 0.7, 0.8),
  K = 100, graph = FALSE, group = c(2, 2, 6)
)

[Package sharp version 1.4.6 Index]