R: Adaptive P-value Thresholding

adapt {adaptMT}

R Documentation

Adaptive P-value Thresholding

Description

adapt is a framework allowing for arbitrary exponential families for computing E-steps and arbitrary algorithms for fitting M-steps.

Usage

adapt(x, pvals, models, dist = beta_family(), s0 = rep(0.45, length(pvals)),
  alphas = seq(0.01, 1, 0.01), params0 = list(pix = NULL, mux = NULL),
  nfits = 20, nms = 1, niter_fit = 10, tol = 1e-04, niter_ms = 20,
  cr = "BIC", verbose = list(print = TRUE, fit = FALSE, ms = TRUE))

Arguments

`x`	covariates (i.e. side-information). Should be compatible to `models`. See Details
`pvals`	a vector of values in [0, 1]. P-values
`models`	an object of class "`adapt_model`" or a list of objects of class "adapt_model". See Details
`dist`	an object of class "`gen_exp_family`". `beta_family()` as default
`s0`	a vector of values in [0, 0.5). Initial threshold.
`alphas`	a vector of values in (0, 1). Target FDR levels.
`params0`	a list in the form of list(pix = , mux = ). Initial guess of pi(x) and mu(x). NULL as default
`nfits`	a positive integer. Number of model-fitting steps. See Details
`nms`	a non-negative integer. Number of model selection steps. See Details
`niter_fit`	a positive integer. Number of EM iterations in model fitting
`tol`	a positive scalar. EM algorithm stops when pi(x) and mu(x) in consecutive steps differ by at most 'tol' for each element
`niter_ms`	a positive integer. Number of EM iterations in model selection
`cr`	a string. The criterion for model selection with BIC as default. Also support AIC, AICC and HIC
`verbose`	a list of logical values in the form of list(print = , fit = , ms = ). Each element indicates whether the relevant information is outputted to the console. See Details

Details

x should have a type compatible to the fitting functions in models. For GLM and GAM, x should be a data.frame. For glmnet, x should be a matrix.

models could either be an adapt_model object, if a single model is used, or a list of adapt_model objects, each of which corresponding to a model. Each element should be generated by gen_adapt_model. For glm/gam/glmnet, one can use the shortcut by running gen_adapt_model with name = "glm" or "gam" or "glmnet" but without specifying pifun, mufun, pifun_init and mufun_init. See examples below.

nfits is the number of model fitting steps plus nms, the model selection steps, if models contains multiple adapt_model objects. Suppose M is the number of masked p-values at the initial step, then the model is updated at the initial step and at every time when [M/nfits] more p-values are revealed. If nms > 0, model selection is performed at the initial step an at every time when [M/nms] more p-values are revealed. Between two consecutive model selection steps, the model selected from the last step is used for model fitting. For example, when M = 10000, nfits = 10 and nms = 2, model selection will be performed at the initial step and when 5000 p-values are revealed, while the model fitting will be performed when 1000, 2000, 3000, 4000, 6000, 7000, 8000, 9000 p-values are revealed.

verbose has three elements: print, fit and ms. If print = TRUE, the progress of the main procedure is outputted to the console, in the form of "alpha = 0.05: FDPhat 0.0333, Number of Rej. 30" (where the numbers are made up for illustration). If fit = TRUE, a progress bar for the model fitting is outputted to the console. Similarly, if ms = TRUE, a progress bar for the model selection is outputted to the console.

For ultra-large scale problems (n > 10^5), it is recommended to keep alphas short because the output s is of size n x length(alphas). is length(alphas).

The output qvals gives the q-values of each hypothesis. qvals[i] is defined as the minimum target FDR level such that pvals[i] is rejected. For hypotheses with p-values above s0, the q-values are set to be Inf because they are never rejected by AdaPT for whatever alpha.

The output order gives the order of (the indices of) p-values being revealed, i.e. being in the region (s, 1-s). The latter hypotheses appeared in order have smaller q-values (i.e. are more likely to be rejected).

Value

`nrejs`	a vector of integers. Number of rejections for each alpha
`rejs`	a list of vector of integers. The set of indices of rejections for each alpha
`s`	a matrix of size `length(pvals) X length(alphas)`. Threshold curves for each alpha
`params`	a list. Each element is a list in the form of `list(pix = , mux = , alpha = , nmasks =)`, recording the parameter estimates, the achieved alpha and the number of masked p-values. To avoid massive storage cost, it only contains the information when a new target FDR level is achieved. As a result, it might be shorter than `nfits`.
`qvals`	a vector of values in [0, 1]UInf. Q-values. See Details
`order`	a permutation of `1:length(pvals)`. Indices of hypotheses arranged in the order of reveal. See Details
`alphas`	same as the input `alphas`
`dist`	same as the input `dist`
`models`	a list of `adapt_model` objects of length `params`. The model used in each fitting step. As in `params`, it only contains the model when a new target FDR level is achieved and each element corresponds to an element of `params`.
`info`	a list of length `nfits`. Each element is a list recording extra information in each fitting step, e.g. degree of freedom (df) and variable importance (vi). As in `params`, it only contains the model information when a new target FDR level is achieved and each element corresponds to an element of `params`.
`args`	a list including the other inputs `nfits`, `nms`, `niter_fit`, `niter_ms`, `tol`, `cr`

Examples


# Load estrogen data
data(estrogen)
pvals <- as.numeric(estrogen$pvals)
x <- data.frame(x = as.numeric(estrogen$ord_high))
dist <- beta_family()

# Subsample the data for convenience
inds <- (x$x <= 5000)
pvals <- pvals[inds]
x <- x[inds,,drop = FALSE]

# Generate models for function adapt
library("splines")
formulas <- paste0("ns(x, df = ", 6:10, ")")
models <- lapply(formulas, function(formula){
    piargs <- muargs <- list(formula = formula)
    gen_adapt_model(name = "glm", piargs = piargs, muargs = muargs)
})

# Run adapt
res <- adapt(x = x, pvals = pvals, models = models,
             dist = dist, nfits = 10)

[Package adaptMT version 1.0.0 Index]