adapt {adaptMT} | R Documentation |
Adaptive P-value Thresholding
Description
adapt
is a framework allowing for arbitrary exponential families for computing E-steps and arbitrary algorithms for fitting M-steps.
Usage
adapt(x, pvals, models, dist = beta_family(), s0 = rep(0.45, length(pvals)),
alphas = seq(0.01, 1, 0.01), params0 = list(pix = NULL, mux = NULL),
nfits = 20, nms = 1, niter_fit = 10, tol = 1e-04, niter_ms = 20,
cr = "BIC", verbose = list(print = TRUE, fit = FALSE, ms = TRUE))
Arguments
x |
covariates (i.e. side-information). Should be compatible to |
pvals |
a vector of values in [0, 1]. P-values |
models |
an object of class " |
dist |
an object of class " |
s0 |
a vector of values in [0, 0.5). Initial threshold. |
alphas |
a vector of values in (0, 1). Target FDR levels. |
params0 |
a list in the form of list(pix = , mux = ). Initial guess of pi(x) and mu(x). NULL as default |
nfits |
a positive integer. Number of model-fitting steps. See Details |
nms |
a non-negative integer. Number of model selection steps. See Details |
niter_fit |
a positive integer. Number of EM iterations in model fitting |
tol |
a positive scalar. EM algorithm stops when pi(x) and mu(x) in consecutive steps differ by at most 'tol' for each element |
niter_ms |
a positive integer. Number of EM iterations in model selection |
cr |
a string. The criterion for model selection with BIC as default. Also support AIC, AICC and HIC |
verbose |
a list of logical values in the form of list(print = , fit = , ms = ). Each element indicates whether the relevant information is outputted to the console. See Details |
Details
x
should have a type compatible to the fitting functions in models
. For GLM and GAM, x
should be a data.frame. For glmnet, x
should be a matrix.
models
could either be an adapt_model
object, if a single model is used, or a list of adapt_model
objects, each of which corresponding to a model. Each element should be generated by gen_adapt_model
. For glm/gam/glmnet, one can use the shortcut by running gen_adapt_model
with name = "glm" or "gam" or "glmnet" but without specifying pifun
, mufun
, pifun_init
and mufun_init
. See examples below.
nfits
is the number of model fitting steps plus nms
, the model selection steps, if models
contains multiple adapt_model
objects. Suppose M is the number of masked p-values at the initial step, then the model is updated at the initial step and at every time when [M/nfits
] more p-values are revealed. If nms > 0
, model selection is performed at the initial step an at every time when [M/nms
] more p-values are revealed. Between two consecutive model selection steps, the model selected from the last step is used for model fitting. For example, when M = 10000, nfits = 10 and nms = 2, model selection will be performed at the initial step and when 5000 p-values are revealed, while the model fitting will be performed when 1000, 2000, 3000, 4000, 6000, 7000, 8000, 9000 p-values are revealed.
verbose
has three elements: print
, fit
and ms
. If print = TRUE
, the progress of the main procedure is outputted to the console, in the form of "alpha = 0.05: FDPhat 0.0333, Number of Rej. 30" (where the numbers are made up for illustration). If fit = TRUE
, a progress bar for the model fitting is outputted to the console. Similarly, if ms = TRUE
, a progress bar for the model selection is outputted to the console.
For ultra-large scale problems (n > 10^5), it is recommended to keep alphas
short because the output s
is of size n x length(alphas)
.
is length(alphas)
.
The output qvals
gives the q-values of each hypothesis. qvals[i]
is defined as the minimum target FDR level such that pvals[i]
is rejected. For hypotheses with p-values above s0, the q-values are set to be Inf because they are never rejected by AdaPT for whatever alpha.
The output order
gives the order of (the indices of) p-values being revealed, i.e. being in the region (s, 1-s). The latter hypotheses appeared in order
have smaller q-values (i.e. are more likely to be rejected).
Value
nrejs |
a vector of integers. Number of rejections for each alpha |
rejs |
a list of vector of integers. The set of indices of rejections for each alpha |
s |
a matrix of size |
params |
a list. Each element is a list in the form of |
qvals |
a vector of values in [0, 1]UInf. Q-values. See Details |
order |
a permutation of |
alphas |
same as the input |
dist |
same as the input |
models |
a list of |
info |
a list of length |
args |
a list including the other inputs |
.
Examples
# Load estrogen data
data(estrogen)
pvals <- as.numeric(estrogen$pvals)
x <- data.frame(x = as.numeric(estrogen$ord_high))
dist <- beta_family()
# Subsample the data for convenience
inds <- (x$x <= 5000)
pvals <- pvals[inds]
x <- x[inds,,drop = FALSE]
# Generate models for function adapt
library("splines")
formulas <- paste0("ns(x, df = ", 6:10, ")")
models <- lapply(formulas, function(formula){
piargs <- muargs <- list(formula = formula)
gen_adapt_model(name = "glm", piargs = piargs, muargs = muargs)
})
# Run adapt
res <- adapt(x = x, pvals = pvals, models = models,
dist = dist, nfits = 10)