R: Fitting Conditional Two-Groups Models on Unmasked P-Values

ctgm_lfdr {adaptMT}

R Documentation

Fitting Conditional Two-Groups Models on Unmasked P-Values

Description

ctgm_lfdr computes the oracle local FDR estimate, by using all p-values without masking.

Usage

ctgm_lfdr(x, pvals, models, dist = beta_family(), type = c("over", "raw"),
  params0 = list(pix = NULL, mux = NULL), niter = 50, cr = "BIC",
  verbose = TRUE)

Arguments

`x`	covariates (i.e. side-information). Should be compatible to `models`. See Details
`pvals`	a vector of values in [0, 1]. P-values
`models`	an object of class "`adapt_model`" or a list of objects of class "adapt_model". See Details
`dist`	an object of class "`gen_exp_family`". `beta_family()` as default
`type`	a character. Either "over" or "raw" indicating the type of local FDR estimates. See Details
`params0`	a list in the form of list(pix = , mux = ). Initial values of pi(x) and mu(x). Both can be set as NULL
`niter`	a positive integer. Number of EM iterations.
`cr`	a string. The criterion for model selection with BIC as default. Also support AIC, AICC and HIC
`verbose`	a logical values in the form of list(fit = , ms = ). Indicate whether the progress of model fitting and model selection is displayed

Details

ctgm_lfdr implements the EM algorithm to fit pi(x) and mu(x) on unmasked p-values. Although it is not related to FDR control of AdaPT, it provides useful measures for post-hoc justification and other purposes. For instance, one can use these local FDR estimates for prioritizing the hypotheses if strict FDR control is not required.

In contrast to adapt, cytm_lfdr does not guarantee FDR control unless the model is correctly specified. It is recommended to use ctgm_lfdr only when FDR control is not required.

x should have a type compatible to the fitting functions in models. For GLM and GAM, x should be a data.frame. For glmnet, x should be a matrix.

models could either be an adapt_model object, if a single model is used, or a list of adapt_model objects, each of which corresponding to a model. Each element should be generated by gen_adapt_model. For glm/gam/glmnet, one can use the shortcut by running gen_adapt_model with name = "glm" or "gam" or "glmnet" but without specifying pifun, mufun, pifun_init and mufun_init. See examples below.

When type = "over", it yields a conservative estimate of local FDR

lfdr(p) = (1 - \pi_{1} + \pi_{1}f_{1}(1)) / (1 - \pi_{1} + \pi_{1}f_{1}(p)).

When type = "raw", it yields the original local FDR.

lfdr(p) = (1 - \pi_{1}) / (1 - \pi_{1} + \pi_{1}f_{1}(p)).

The former is shown to be more stable and reliable because the weak identifiability in conditional mixture models.

Value

lfdra vector of values in [0, 1]. Local FDR estimates of each hypothesis.
modelan adapt_model object. The selected model if multiple models are provided.

Examples


# Load estrogen data
data(estrogen)
pvals <- as.numeric(estrogen$pvals)
x <- data.frame(x = as.numeric(estrogen$ord_high))
dist <- beta_family()

# Subsample the data for convenience
inds <- (x$x <= 5000)
pvals <- pvals[inds]
x <- x[inds,,drop = FALSE]

# Generate models for function adapt
library("splines")
formulas <- paste0("ns(x, df = ", 6:10, ")")
models <- lapply(formulas, function(formula){
    piargs <- muargs <- list(formula = formula)
    gen_adapt_model(name = "glm", piargs = piargs, muargs = muargs)
})

# Run ctgm_lfdr with two types of lfdr estimates
res_over <- ctgm_lfdr(x, pvals, models, type = "over")
res_raw <- ctgm_lfdr(x, pvals, models, type = "raw")

# Compare two estimates
par(mfrow = c(2, 1))
hist(res_over$lfdr)
hist(res_raw$lfdr)

[Package adaptMT version 1.0.0 Index]