estimateD0Robust {MAnorm2} | R Documentation |
Estimate Number of Prior Degrees of Freedom in a Robust Manner
Description
estimateD0Robust
underlies other interface functions for estimating
the number of prior degrees of freedom associated with an unadjusted
mean-variance curve (or a set of unadjusted mean-variance curves)
in a robust manner.
Usage
estimateD0Robust(
z,
m,
p_low = 0.01,
p_up = 0.1,
d0_low = 0.001,
d0_up = 1e+06,
eps = d0_low,
nw = gauss.quad(128, kind = "legendre")
)
Arguments
z |
A list of which each element is a vector of FZ statistics
corresponding to a |
m |
A vector of numbers of replicates in |
p_low , p_up |
Lower- and upper-tail probabilities for Winsorizing the
FZ statistics associated with each |
d0_low , d0_up |
Positive reals specifying the lower and upper bounds
of estimated During the estimation process, if |
eps |
The required numeric precision for estimating |
nw |
A list containing |
Details
For each bioCond
object with replicate samples, a vector of
FZ statistics can be deduced from the unadjusted mean-variance curve
associated with it. More specifically, for each genomic interval in a
bioCond
with replicate samples, its FZ statistic is defined to be
log(t_hat / v0)
, where t_hat
is the observed variance of signal
intensities of the interval, and v0
is the interval's prior variance
read from the corresponding mean-variance curve.
Theoretically, each FZ statistic follows a scaled Fisher's Z distribution
plus a constant (since the mean-variance curve is not adjusted yet),
and we derive a robust estimation of d0
(i.e., number of prior
degrees of freedom) by
Winsorizing the FZ statistics of each bioCond
and matching the
resulting sample variance with the theoretical variance of the Winsorized
distribution, which is calculated by using numerical integration (see
also "References"). Since the theoretical variance has no compact forms
regarding d0
, the matching procedure is achieved by using the method
of bisection.
Inspired by the ordinary (non-robust) routine for estimating d0
, we
derive the final estimate of d0
by separately applying the function
trigamma(x / 2)
to the estimated d0
from each
bioCond
, taking a weighted average across the results, and applying
the inverse of the function (achieved by using Newton iteration;
see also trigamma
). Here the
weights are the numbers of genomic intervals (in the bioCond
s) minus
1 that are used to calculate FZ statistics.
Value
The estimated number of prior degrees of freedom. Note that the
function returns NA
if there are not sufficient genomic intervals
for estimating it.
References
Phipson, B., et al., Robust Hyperparameter Estimation Protects against Hypervariable Genes and Improves Power to Detect Differential Expression. Annals of Applied Statistics, 2016. 10(2): p. 946-963.
See Also
bioCond
for creating a bioCond
object;
fitMeanVarCurve
for fitting a mean-variance curve;
estimatePriorDfRobust
for an interface to robustly
estimating the number of prior degrees of freedom on bioCond
objects; varRatio
for a description of variance ratio
factor; scaleMeanVarCurveRobust
for robustly
estimating the variance ratio factor
for adjusting a mean-variance curve (or a set of curves).
estimateD0
and scaleMeanVarCurve
for the ordinary (non-robust) routines for estimating number of prior
degrees of freedom and variance ratio factor, respectively.
Examples
## Not run:
## Private functions involved.
# For generating random FZ statistics with outliers. Note that the argument
# scaling controls how extreme outliers are.
rFZ <- function(n, var.ratio, m, d0, p_low, p_up, scaling) {
z <- list()
p_low <- p_low * 0.9
p_up <- p_up * 0.9
for (i in 1:length(n)) {
x <- rf(n[i], m[i] - 1, d0)
q_low <- qf(p_low, m[i] - 1, d0, lower.tail = TRUE)
q_up <- qf(p_up, m[i] - 1, d0, lower.tail = FALSE)
f <- x < q_low
x[f] <- x[f] / runif(sum(f), 1, scaling)
f <- x > q_up
x[f] <- x[f] * runif(sum(f), 1, scaling)
z[[i]] <- log(var.ratio[i]) + log(x)
}
z
}
# Settings.
n <- c(30000, 40000)
var.ratio <- c(1.2, 2.5)
m <- c(2, 3)
d0 <- 17
p_low <- 0.01
p_up <- 0.1
# Compare estimation results from ordinary (non-robust) and robust routines.
# Case 1: no outliers.
set.seed(100)
scaling <- 1
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)
# Case 2: moderate outliers.
scaling <- 3
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)
# Case 3: extreme outliers.
scaling <- 10
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)
## End(Not run)