R: Simultaneous Band

sim_bound {bandsfdp}

R Documentation

Simultaneous Band

Description

This function computes upper prediction bounds on the target wins among the top k hypotheses of TDC, for each k = 1,\ldots,n where n is the total number of hypotheses.

Usage

sim_bound(
  labels,
  gamma,
  type,
  d_max = NULL,
  max_fdp = 0.5,
  c = 0.5,
  lambda = 0.5
)

simband(
  labels,
  gamma,
  type,
  d_max = NULL,
  max_fdp = 0.5,
  c = 0.5,
  lambda = 0.5
)

Arguments

`labels`	A vector of (ordered) labels. See details below.
`gamma`	The confidence parameter of the band. Typical values include `gamma = 0.05` or `gamma = 0.01`.
`type`	A character string specifying which band to use. Must be one of `"stband"` or `"uniband"`.
`d_max`	An optional positive integer specifying the maximum number of decoy wins considered in calculating the bands.
`max_fdp`	A number specifying the maximum FDP considered by the user in calculating the bands. Used to compute `d_max` if `d_max` is set to `NULL`.
`c`	Determines the ranks of the target score that are considered winning. Defaults to `c = 0.5` for (single-decoy) TDC.
`lambda`	Determines the ranks of the target score that are considered losing. Defaults to `lambda = 0.5` for (single-decoy) TDC.

Details

In (single-decoy) TDC, each hypothesis is associated to a winning score and a label (1 for a target win, -1 for a decoy win). This function assumes that the hypotheses are ordered in decreasing order of winning scores (with ties broken at random). The argument labels, therefore, must be ordered according to this rule.

This function also supports the extension of TDC that uses multiple decoys. In that setup, the target score is competed with multiple decoy scores and the rank of the target score after competition is used to determine whether the hypothesis is a target win (label = 1), decoy win (-1) or uncounted (0). The top c proportion of ranks are considered winning, the bottom 1-lambda losing, and all the rest uncounted.

The threshold of TDC is given by the formula (assuming hypotheses are ordered):

\max\{k : \frac{D_k + 1}{T_k \vee 1} \cdot \frac{c}{1-\lambda} \leq \alpha\}

where T_k is the number of target wins among the top k hypotheses, and D_k is the number of decoy wins similarly.

The argument gamma sets a confidence level of 1-gamma. Both the uniform and standardized bands require pre-computed Monte Carlo statistics, so only certain values of gamma are available to use. Commonly used confidence levels, like 0.95 and 0.99, are available. We refer the reader to the README of this package for more details.

The argument d_max controls the rate at which the returned bounds increase: a larger d_max results in a more conservative bound. If, however, D_k + 1 exceeds d_max for some index k, each target win thereafter is considered a false discovery when computing the bound. Thus it is important that d_max, chosen a priori, is large enough. Given it is sufficiently large, the precise value of d_max does not have a significant effect on the resulting bounds (see https://arxiv.org/abs/2302.11837 for more details).

We recommend setting d_max = NULL so that it is computed automatically using max_fdp. This argument ensures that D_k + 1 never exceeds d_max when the (non-interpolated) FDP bound on the top k hypotheses is less than max_fdp.

Value

A vector of upper prediction bounds on the FDP of target wins among the top k hypotheses for each k = 1,\ldots,n where n is the total number of hypotheses.

References

Ebadi et al. (2022), Bounding the FDP in competition-based control of the FDR https://arxiv.org/abs/2302.11837.

Examples

if (requireNamespace("fdpbandsdata", quietly = TRUE)) {
  set.seed(123)
  labels <- c(
    rep(1, 250),
    sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.9, 0.1)),
    sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.5, 0.5)),
    sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.1, 0.9))
  )
  gamma <- 0.05
  head(sim_bound(labels, gamma, "stband"))
}

[Package bandsfdp version 1.1.0 Index]