ctgm_lfdr {adaptMT} | R Documentation |
Fitting Conditional Two-Groups Models on Unmasked P-Values
Description
ctgm_lfdr
computes the oracle local FDR estimate, by using all p-values without masking.
Usage
ctgm_lfdr(x, pvals, models, dist = beta_family(), type = c("over", "raw"),
params0 = list(pix = NULL, mux = NULL), niter = 50, cr = "BIC",
verbose = TRUE)
Arguments
x |
covariates (i.e. side-information). Should be compatible to |
pvals |
a vector of values in [0, 1]. P-values |
models |
an object of class " |
dist |
an object of class " |
type |
a character. Either "over" or "raw" indicating the type of local FDR estimates. See Details |
params0 |
a list in the form of list(pix = , mux = ). Initial values of pi(x) and mu(x). Both can be set as NULL |
niter |
a positive integer. Number of EM iterations. |
cr |
a string. The criterion for model selection with BIC as default. Also support AIC, AICC and HIC |
verbose |
a logical values in the form of list(fit = , ms = ). Indicate whether the progress of model fitting and model selection is displayed |
Details
ctgm_lfdr
implements the EM algorithm to fit pi(x) and mu(x) on unmasked p-values. Although it is not related to FDR control of AdaPT, it provides useful measures for post-hoc justification and other purposes.
For instance, one can use these local FDR estimates for prioritizing the hypotheses if strict FDR control is not required.
In contrast to adapt
, cytm_lfdr
does not guarantee FDR control unless the model is correctly specified. It is recommended to use ctgm_lfdr
only when FDR control is not required.
x
should have a type compatible to the fitting functions in models
. For GLM and GAM, x
should be a data.frame. For glmnet, x
should be a matrix.
models
could either be an adapt_model
object, if a single model is used, or a list of adapt_model
objects, each of which corresponding to a model. Each element should be generated by gen_adapt_model
. For glm/gam/glmnet, one can use the shortcut by running gen_adapt_model
with name = "glm" or "gam" or "glmnet" but without specifying pifun
, mufun
, pifun_init
and mufun_init
. See examples below.
When type = "over"
, it yields a conservative estimate of local FDR
lfdr(p) = (1 - \pi_{1} + \pi_{1}f_{1}(1)) / (1 - \pi_{1} + \pi_{1}f_{1}(p)).
When type = "raw"
, it yields the original local FDR.
lfdr(p) = (1 - \pi_{1}) / (1 - \pi_{1} + \pi_{1}f_{1}(p)).
The former is shown to be more stable and reliable because the weak identifiability in conditional mixture models.
Value
lfdra vector of values in [0, 1]. Local FDR estimates of each hypothesis.
modelan
adapt_model
object. The selected model if multiple models are provided.
Examples
# Load estrogen data
data(estrogen)
pvals <- as.numeric(estrogen$pvals)
x <- data.frame(x = as.numeric(estrogen$ord_high))
dist <- beta_family()
# Subsample the data for convenience
inds <- (x$x <= 5000)
pvals <- pvals[inds]
x <- x[inds,,drop = FALSE]
# Generate models for function adapt
library("splines")
formulas <- paste0("ns(x, df = ", 6:10, ")")
models <- lapply(formulas, function(formula){
piargs <- muargs <- list(formula = formula)
gen_adapt_model(name = "glm", piargs = piargs, muargs = muargs)
})
# Run ctgm_lfdr with two types of lfdr estimates
res_over <- ctgm_lfdr(x, pvals, models, type = "over")
res_raw <- ctgm_lfdr(x, pvals, models, type = "raw")
# Compare two estimates
par(mfrow = c(2, 1))
hist(res_over$lfdr)
hist(res_raw$lfdr)