StabilityMetrics {sharp}R Documentation

Stability selection metrics

Description

Computes the stability score (see StabilityScore) and upper-bounds of the PFER and FDP from selection proportions of models with a given parameter controlling the sparsity of the underlying algorithm and for different thresholds in selection proportions.

Usage

StabilityMetrics(
  selprop,
  pk = NULL,
  pi_list = seq(0.6, 0.9, by = 0.01),
  K = 100,
  n_cat = NULL,
  PFER_method = "MB",
  PFER_thr_blocks = Inf,
  FDP_thr_blocks = Inf,
  Sequential_template = NULL,
  graph = TRUE,
  group = NULL
)

Arguments

selprop

array of selection proportions.

pk

optional vector encoding the grouping structure. Only used for multi-block stability selection where pk indicates the number of variables in each group. If pk=NULL, single-block stability selection is performed.

pi_list

vector of thresholds in selection proportions. If n_cat=NULL or n_cat=2, these values must be >0 and <1. If n_cat=3, these values must be >0.5 and <1.

K

number of resampling iterations.

n_cat

computation options for the stability score. Default is NULL to use the score based on a z test. Other possible values are 2 or 3 to use the score based on the negative log-likelihood.

PFER_method

method used to compute the upper-bound of the expected number of False Positives (or Per Family Error Rate, PFER). If PFER_method="MB", the method proposed by Meinshausen and Bühlmann (2010) is used. If PFER_method="SS", the method proposed by Shah and Samworth (2013) under the assumption of unimodality is used.

PFER_thr_blocks

vector of block-specific thresholds in PFER for constrained calibration by error control. If PFER_thr=Inf and FDP_thr=Inf, unconstrained calibration is used.

FDP_thr_blocks

vector of block-specific thresholds in the expected proportion of falsely selected features (or False Discovery Proportion, FDP) for constrained calibration by error control. If PFER_thr=Inf and FDP_thr=Inf, unconstrained calibration is used.

Sequential_template

logical matrix encoding the type of procedure to use for data with multiple blocks in stability selection graphical modelling. For multi-block estimation, the stability selection model is constructed as the union of block-specific stable edges estimated while the others are weakly penalised (TRUE only for the block currently being calibrated and FALSE for other blocks). Other approaches with joint calibration of the blocks are allowed (all entries are set to TRUE).

graph

logical indicating if stability selection is performed in a regression (if FALSE) or graphical (if TRUE) framework.

group

vector encoding the grouping structure among predictors. This argument indicates the number of variables in each group and only needs to be provided for group (but not sparse group) penalisation.

Value

A list with:

S

a matrix of the best (block-specific) stability scores for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.

Lambda

a matrix of (block-specific) penalty parameters. In multi-block stability selection, rows correspond to sets of penalty parameters and columns correspond to different blocks.

Q

a matrix of average numbers of (block-specific) edges selected by the underlying algorithm for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.

Q_s

a matrix of calibrated numbers of (block-specific) stable edges for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.

P

a matrix of calibrated (block-specific) thresholds in selection proportions for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.

PFER

a matrix of computed (block-specific) upper-bounds in PFER of calibrated graphs for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.

FDP

a matrix of computed (block-specific) upper-bounds in FDP of calibrated stability selection models for different (sets of) penalty parameters. In multi-block stability selection, rows correspond to different sets of penalty parameters, (values are stored in the output "Lambda") and columns correspond to different blocks.

S_2d

an array of (block-specific) stability scores obtained with different combinations of parameters. Rows correspond to different (sets of) penalty parameters and columns correspond to different thresholds in selection proportions. In multi-block stability selection, indices along the third dimension correspond to different blocks.

PFER_2d

an array of computed upper-bounds of PFER obtained with different combinations of parameters. Rows correspond to different penalty parameters and columns correspond to different thresholds in selection proportions. Not available in multi-block stability selection graphical modelling.

FDP_2d

an array of computed upper-bounds of FDP obtained with different combinations of parameters. Rows correspond to different penalty parameters and columns correspond to different thresholds in selection proportions. Not available in multi-block stability selection graphical modelling.

References

Bodinier B, Filippi S, Nøst TH, Chiquet J, Chadeau-Hyam M (2023). “Automated calibration for stability selection in penalised regression and graphical models.” Journal of the Royal Statistical Society Series C: Applied Statistics, qlad058. ISSN 0035-9254, doi:10.1093/jrsssc/qlad058, https://academic.oup.com/jrsssc/advance-article-pdf/doi/10.1093/jrsssc/qlad058/50878777/qlad058.pdf.

Meinshausen N, Bühlmann P (2010). “Stability selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473. doi:10.1111/j.1467-9868.2010.00740.x.

Shah RD, Samworth RJ (2013). “Variable selection with error control: another look at stability selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 55-80. doi:10.1111/j.1467-9868.2011.01034.x.

See Also

Other stability metric functions: ConsensusScore(), FDP(), PFER(), StabilityScore()

Examples

## Sparse or sparse group penalisation

# Simulating set of selection proportions
set.seed(1)
selprop <- matrix(round(runif(n = 20), digits = 2), nrow = 2)

# Computing stability scores for different thresholds
metrics <- StabilityMetrics(
  selprop = selprop, pi = c(0.6, 0.7, 0.8),
  K = 100, graph = FALSE
)


## Group penalisation

# Simulating set of selection proportions
set.seed(1)
selprop <- matrix(round(runif(n = 6), digits = 2), nrow = 2)
selprop <- cbind(
  selprop[, 1], selprop[, 1],
  selprop[, 2], selprop[, 2],
  matrix(rep(selprop[, 3], each = 6), nrow = 2, byrow = TRUE)
)

# Computing stability scores for different thresholds
metrics <- StabilityMetrics(
  selprop = selprop, pi = c(0.6, 0.7, 0.8),
  K = 100, graph = FALSE, group = c(2, 2, 6)
)

[Package sharp version 1.4.6 Index]