adapt {adaptMT}R Documentation

Adaptive P-value Thresholding

Description

adapt is a framework allowing for arbitrary exponential families for computing E-steps and arbitrary algorithms for fitting M-steps.

Usage

adapt(x, pvals, models, dist = beta_family(), s0 = rep(0.45, length(pvals)),
  alphas = seq(0.01, 1, 0.01), params0 = list(pix = NULL, mux = NULL),
  nfits = 20, nms = 1, niter_fit = 10, tol = 1e-04, niter_ms = 20,
  cr = "BIC", verbose = list(print = TRUE, fit = FALSE, ms = TRUE))

Arguments

x

covariates (i.e. side-information). Should be compatible to models. See Details

pvals

a vector of values in [0, 1]. P-values

models

an object of class "adapt_model" or a list of objects of class "adapt_model". See Details

dist

an object of class "gen_exp_family". beta_family() as default

s0

a vector of values in [0, 0.5). Initial threshold.

alphas

a vector of values in (0, 1). Target FDR levels.

params0

a list in the form of list(pix = , mux = ). Initial guess of pi(x) and mu(x). NULL as default

nfits

a positive integer. Number of model-fitting steps. See Details

nms

a non-negative integer. Number of model selection steps. See Details

niter_fit

a positive integer. Number of EM iterations in model fitting

tol

a positive scalar. EM algorithm stops when pi(x) and mu(x) in consecutive steps differ by at most 'tol' for each element

niter_ms

a positive integer. Number of EM iterations in model selection

cr

a string. The criterion for model selection with BIC as default. Also support AIC, AICC and HIC

verbose

a list of logical values in the form of list(print = , fit = , ms = ). Each element indicates whether the relevant information is outputted to the console. See Details

Details

x should have a type compatible to the fitting functions in models. For GLM and GAM, x should be a data.frame. For glmnet, x should be a matrix.

models could either be an adapt_model object, if a single model is used, or a list of adapt_model objects, each of which corresponding to a model. Each element should be generated by gen_adapt_model. For glm/gam/glmnet, one can use the shortcut by running gen_adapt_model with name = "glm" or "gam" or "glmnet" but without specifying pifun, mufun, pifun_init and mufun_init. See examples below.

nfits is the number of model fitting steps plus nms, the model selection steps, if models contains multiple adapt_model objects. Suppose M is the number of masked p-values at the initial step, then the model is updated at the initial step and at every time when [M/nfits] more p-values are revealed. If nms > 0, model selection is performed at the initial step an at every time when [M/nms] more p-values are revealed. Between two consecutive model selection steps, the model selected from the last step is used for model fitting. For example, when M = 10000, nfits = 10 and nms = 2, model selection will be performed at the initial step and when 5000 p-values are revealed, while the model fitting will be performed when 1000, 2000, 3000, 4000, 6000, 7000, 8000, 9000 p-values are revealed.

verbose has three elements: print, fit and ms. If print = TRUE, the progress of the main procedure is outputted to the console, in the form of "alpha = 0.05: FDPhat 0.0333, Number of Rej. 30" (where the numbers are made up for illustration). If fit = TRUE, a progress bar for the model fitting is outputted to the console. Similarly, if ms = TRUE, a progress bar for the model selection is outputted to the console.

For ultra-large scale problems (n > 10^5), it is recommended to keep alphas short because the output s is of size n x length(alphas). is length(alphas).

The output qvals gives the q-values of each hypothesis. qvals[i] is defined as the minimum target FDR level such that pvals[i] is rejected. For hypotheses with p-values above s0, the q-values are set to be Inf because they are never rejected by AdaPT for whatever alpha.

The output order gives the order of (the indices of) p-values being revealed, i.e. being in the region (s, 1-s). The latter hypotheses appeared in order have smaller q-values (i.e. are more likely to be rejected).

Value

nrejs

a vector of integers. Number of rejections for each alpha

rejs

a list of vector of integers. The set of indices of rejections for each alpha

s

a matrix of size length(pvals) X length(alphas). Threshold curves for each alpha

params

a list. Each element is a list in the form of list(pix = , mux = , alpha = , nmasks =), recording the parameter estimates, the achieved alpha and the number of masked p-values. To avoid massive storage cost, it only contains the information when a new target FDR level is achieved. As a result, it might be shorter than nfits.

qvals

a vector of values in [0, 1]UInf. Q-values. See Details

order

a permutation of 1:length(pvals). Indices of hypotheses arranged in the order of reveal. See Details

alphas

same as the input alphas

dist

same as the input dist

models

a list of adapt_model objects of length params. The model used in each fitting step. As in params, it only contains the model when a new target FDR level is achieved and each element corresponds to an element of params.

info

a list of length nfits. Each element is a list recording extra information in each fitting step, e.g. degree of freedom (df) and variable importance (vi). As in params, it only contains the model information when a new target FDR level is achieved and each element corresponds to an element of params.

args

a list including the other inputs nfits, nms, niter_fit, niter_ms, tol, cr

.

Examples


# Load estrogen data
data(estrogen)
pvals <- as.numeric(estrogen$pvals)
x <- data.frame(x = as.numeric(estrogen$ord_high))
dist <- beta_family()

# Subsample the data for convenience
inds <- (x$x <= 5000)
pvals <- pvals[inds]
x <- x[inds,,drop = FALSE]

# Generate models for function adapt
library("splines")
formulas <- paste0("ns(x, df = ", 6:10, ")")
models <- lapply(formulas, function(formula){
    piargs <- muargs <- list(formula = formula)
    gen_adapt_model(name = "glm", piargs = piargs, muargs = muargs)
})

# Run adapt
res <- adapt(x = x, pvals = pvals, models = models,
             dist = dist, nfits = 10)



[Package adaptMT version 1.0.0 Index]