gen_bound {bandsfdp} | R Documentation |
Generalized band
Description
This function computes an upper prediction bound on the FDP among target wins
in any set R
of hypotheses of TDC. See details for more information.
Usage
gen_bound(
labels,
indices,
gamma,
type,
d_max = NULL,
max_fdp = 0.5,
c = 0.5,
lambda = 0.5
)
genband(
labels,
indices,
gamma,
type,
d_max = NULL,
max_fdp = 0.5,
c = 0.5,
lambda = 0.5
)
Arguments
labels |
A vector of (ordered) labels. See details below. |
indices |
A vector specifying the indices of hypotheses for which an upper prediction bound on the FDP is computed. |
gamma |
The confidence parameter of the band. Typical values include
|
type |
A character string specifying which band to use. Must be one of
|
d_max |
An optional positive integer specifying the maximum number of decoy wins considered in calculating the bands. |
max_fdp |
A number specifying the maximum FDP considered by the user in
calculating the bands. Used to compute |
c |
Determines the ranks of the target score that are considered
winning. Defaults to |
lambda |
Determines the ranks of the target score that are
considered losing. Defaults to |
Details
In (single-decoy) TDC, each hypothesis is associated to a
winning score and a label (1 for a target win, -1 for a decoy win). This
function assumes that the hypotheses are ordered in decreasing order of
winning scores (with ties broken at random). The argument labels
,
therefore, must be ordered according to this rule.
This function also supports the extension of TDC that uses multiple
decoys. In that setup, the target score is competed with multiple decoy
scores and the rank of the target score after competition is used to determine whether the
hypothesis is a target win (label = 1), decoy win (-1) or uncounted (0).
The top c
proportion of ranks are considered winning, the bottom
1-lambda
losing, and all the rest uncounted.
The threshold of TDC is given by the formula (assuming hypotheses are ordered):
\max\{k : \frac{D_k + 1}{T_k \vee 1} \cdot \frac{c}{1-\lambda} \leq \alpha\}
where T_k
is the number of target wins among the top
k
hypotheses, and D_k
is the number of decoy wins similarly.
The argument gamma
sets a confidence level of 1-gamma
. Both
the uniform and standardized bands require pre-computed Monte Carlo
statistics, so only certain values of gamma
are available to use.
Commonly used confidence levels, like 0.95 and 0.99, are available.
We refer the reader to the README of this package for more details.
The argument d_max
controls the rate at which the returned bounds
increase: a larger d_max
results in a more conservative bound.
If, however, D_k + 1
exceeds d_max
for some index k
, each target
win thereafter is considered a false discovery when computing the bound.
Thus it is important that d_max
, chosen a priori, is large enough. Given
it is sufficiently large, the precise value of d_max
does not have a
significant effect on the resulting bounds (see https://arxiv.org/abs/2302.11837 for more details).
We recommend setting d_max = NULL
so that it is computed automatically
using max_fdp
. This argument ensures that D_k + 1
never
exceeds d_max
when the (non-interpolated) FDP bound on the top
k
hypotheses is less than max_fdp
.
Value
An upper prediction bound on the FDP among target wins in the set of
hypotheses whose indices
are given as input.
References
Ebadi et al. (2022), Bounding the FDP in competition-based control of the FDR https://arxiv.org/abs/2302.11837.
Examples
if (requireNamespace("fdpbandsdata", quietly = TRUE)) {
set.seed(123)
labels <- c(
rep(1, 250),
sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.9, 0.1)),
sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.5, 0.5)),
sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.1, 0.9))
)
indices <- c(1:100, 300:400, 600:650)
gamma <- 0.05
gen_bound(labels, indices, gamma, "stband")
}