fitMix3 {NewmanOmics} | R Documentation |
Compute FDR from Three-Component Beta Mixture
Description
Provides functions to fit a beta-mixture model to a set of p-values that has peaks at both zero and one, and to estimate false discovery rates.
Usage
fitMix3(datavec, forever=100, epsilon=0.001, relative = 0.001, print.level=0)
computeFDR(object, alpha)
computeCutoff(object, fdr)
Arguments
datavec |
A numeric vector containing p-values. |
forever |
An integer; maximum number of iterations while fitting the mixture model. |
epsilon |
A real number; change in the log likelihood that should be used to terminate the model-fitting loop. |
relative |
A real number; change in the relative log likelihood that should be used to terminate the model-fitting loop. |
print.level |
An integer; how much detail should |
object |
An object of the |
alpha |
A real number between 0 and 1; the cutoff on the nominal p-value where the FDR should be computed. |
fdr |
A real number beteen 0 and 1; the targeted FDR value. |
Details
We have observed empirically that the set of p-values obtained when computing the Newman paired test statistic often has peaks both at zero (representing genes of interest) and at one (representing "boring" genes that change much less than expected). We attribute the latter phenomenon to the fact that we use locally smoothed instead of gene-by-gene estimates of the standard deviation; genes whose SD is increased by the smoothing process contribute to the boring peak near one.
To estimate p-values in this context, we fit a three-component beta mixture model, combining (1) a right-peaked distribution Beta(L,1), (2) a left-peaked dfistribution Beta(1,M), and (3) a uniform distribution. Specfically, we look for models of the form
alpha*Beta(L,1) + beta*Beta(1, M) + gamma*Beta(1,1)
.
Model-fitting uses an expectation-maximization (EM) algorithm. In
addition to the parameters mle=c(L,M)
and psi=c(alpha,
beta, gamma)
, we introduce a matrix Z
of latent variables that
indicate which distribution each point is likely to arise
form. Z
has three columns (one for each mixture component) and
one row for each p-value; the entries in each row are nonegative and
sum to one. The M-step of the algorithm uses the nlm
optimization function to compute the maximum-likelihood mle
values given psi
and Z
. The E-step first updates
psi
from the Z
-matrix, and then updates the values of
Z
based on the current mle
.
We are able to use the mixture distribution to compute the relationship between a cutoff on the nominal p-values and the false discovery rate (FDR).
Value
The model-fitting function, fitMix3
, returns an object of the
MixOf3Beta
class.
The computeFDR
function returns a real number in [0,1], the
false discovery rate assiociated with the nominal cutoff.
The computeCutoff
function returns a real number in [0,1], the
cutoff required to achieve the desired FDR.
Examples
set.seed(98765)
ds <- c(rbeta(3000, 20, 1),
rbeta(1000, 1, 40),
runif(6000))
fit <- fitMix3(ds)
computeFDR(fit, 0.01)
computeCutoff(fit, 0.01)
computeFDR(fit, 0.0016438)
computeCutoff(fit, 0.05)
computeFDR(fit, 0.00702114)