tdc_ub {bandsfdp} | R Documentation |
Uniform Band
Description
This function computes an upper prediction bound, derived from the uniform band, on the FDP in TDC's list of discoveries.
Usage
tdc_ub(
thresholds,
labels,
alpha,
gamma,
c = 0.5,
lambda = 0.5,
n = length(labels),
interpolate = TRUE
)
uniband(
thresholds,
labels,
alpha,
gamma,
c = 0.5,
lambda = 0.5,
n = length(labels),
interpolate = TRUE
)
Arguments
thresholds |
The rejection threshold of TDC. If given as a vector, an upper prediction bound is returned for each element. |
labels |
A vector of (ordered) labels. See details below. |
alpha |
The FDR threshold. |
gamma |
The confidence parameter of the bound. Typical values include
|
c |
Determines the ranks of the target score that are considered
winning. Defaults to |
lambda |
Determines the ranks of the target score that are
considered losing. Defaults to |
n |
The number of hypotheses. Defaults to the length of |
interpolate |
A boolean indicating whether the bands should be
interpolated. Offers a slight boost in performance at the cost of computing
power. Defaults to |
Details
In (single-decoy) TDC, each hypothesis is associated to a
winning score and a label (1 for a target win, -1 for a decoy win). This
function assumes that the hypotheses are ordered in decreasing order of
winning scores (with ties broken at random). The argument labels
,
therefore, must be ordered according to this rule.
This function also supports the extension of TDC that uses multiple
decoys. In that setup, the target score is competed with multiple decoy
scores and the rank of the target score after competition is used to determine whether the
hypothesis is a target win (label = 1), decoy win (-1) or uncounted (0).
The top c
proportion of ranks are considered winning, the bottom
1-lambda
losing, and all the rest uncounted.
The threshold of TDC is given by the formula:
\max\{k : \frac{D_k + 1}{T_k \vee 1} \cdot \frac{c}{1-\lambda} \leq \alpha\}
where T_k
is the number of target wins among the top
k
hypotheses, and D_k
is the number of decoy wins similarly.
The argument gamma
sets a confidence level of 1-gamma
. Since
the uniform band requires pre-computed Monte Carlo statistics, only
certain values of gamma
are available to use. Commonly used
confidence levels, like 0.95 and 0.99, are available. We refer the reader
to the README of this package for more details.
The argument alpha
, used to compute the threshold of TDC, is also
used in this function. It serves to compute an appropriate d_max
for a non-trivial bound. In particular, if the user inputs a vector of
thresholds
, a bound is returned for each element of
thresholds
using the same d_max. For more details, see:
https://arxiv.org/abs/2302.11837.
We recommend the use of interpolate = TRUE
(default), as it generally
results in a tighter bound. This comes at the cost of performance: the bound
for each threshold is computed in O(n) time with interpolation and O(1)
without.
Value
An upper prediction bound on the FDP in TDC's list of discoveries.
If thresholds
is a vector, returns an upper prediction bound for each
element of thresholds
.
References
Ebadi et al. (2022), Bounding the FDP in competition-based control of the FDR https://arxiv.org/abs/2302.11837.
Examples
if (requireNamespace("fdpbandsdata", quietly = TRUE)) {
set.seed(123)
thresholds <- c(250, 500, 750, 1000)
labels <- c(
rep(1, 250),
sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.9, 0.1)),
sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.5, 0.5)),
sample(c(1, -1), size = 250, replace = TRUE, prob = c(0.1, 0.9))
)
alpha <- 0.05
gamma <- 0.05
tdc_ub(thresholds, labels, alpha, gamma)
}