estimateD0Robust {MAnorm2}R Documentation

Estimate Number of Prior Degrees of Freedom in a Robust Manner

Description

estimateD0Robust underlies other interface functions for estimating the number of prior degrees of freedom associated with an unadjusted mean-variance curve (or a set of unadjusted mean-variance curves) in a robust manner.

Usage

estimateD0Robust(
  z,
  m,
  p_low = 0.01,
  p_up = 0.1,
  d0_low = 0.001,
  d0_up = 1e+06,
  eps = d0_low,
  nw = gauss.quad(128, kind = "legendre")
)

Arguments

z

A list of which each element is a vector of FZ statistics corresponding to a bioCond object (see also "Details").

m

A vector of numbers of replicates in bioCond objects. Must correspond to z one by one in the same order.

p_low, p_up

Lower- and upper-tail probabilities for Winsorizing the FZ statistics associated with each bioCond.

d0_low, d0_up

Positive reals specifying the lower and upper bounds of estimated d0 (i.e., number of prior degrees of freedom). Inf is not allowed.

During the estimation process, if d0 is sure to be less than or equal to d0_low, it will be considered as 0, and if it is sure to be larger than or equal to d0_up, it will be considered as positive infinity.

eps

The required numeric precision for estimating d0.

nw

A list containing nodes and weights variables for calculating the definite integral of a function f over the interval [-1, 1], which is approximated by sum(nw$weights * f(nw$nodes)). By default, a set of Gauss-Legendre nodes along with the corresponding weights calculated by gauss.quad is used.

Details

For each bioCond object with replicate samples, a vector of FZ statistics can be deduced from the unadjusted mean-variance curve associated with it. More specifically, for each genomic interval in a bioCond with replicate samples, its FZ statistic is defined to be log(t_hat / v0), where t_hat is the observed variance of signal intensities of the interval, and v0 is the interval's prior variance read from the corresponding mean-variance curve.

Theoretically, each FZ statistic follows a scaled Fisher's Z distribution plus a constant (since the mean-variance curve is not adjusted yet), and we derive a robust estimation of d0 (i.e., number of prior degrees of freedom) by Winsorizing the FZ statistics of each bioCond and matching the resulting sample variance with the theoretical variance of the Winsorized distribution, which is calculated by using numerical integration (see also "References"). Since the theoretical variance has no compact forms regarding d0, the matching procedure is achieved by using the method of bisection.

Inspired by the ordinary (non-robust) routine for estimating d0, we derive the final estimate of d0 by separately applying the function trigamma(x / 2) to the estimated d0 from each bioCond, taking a weighted average across the results, and applying the inverse of the function (achieved by using Newton iteration; see also trigamma). Here the weights are the numbers of genomic intervals (in the bioConds) minus 1 that are used to calculate FZ statistics.

Value

The estimated number of prior degrees of freedom. Note that the function returns NA if there are not sufficient genomic intervals for estimating it.

References

Phipson, B., et al., Robust Hyperparameter Estimation Protects against Hypervariable Genes and Improves Power to Detect Differential Expression. Annals of Applied Statistics, 2016. 10(2): p. 946-963.

See Also

bioCond for creating a bioCond object; fitMeanVarCurve for fitting a mean-variance curve; estimatePriorDfRobust for an interface to robustly estimating the number of prior degrees of freedom on bioCond objects; varRatio for a description of variance ratio factor; scaleMeanVarCurveRobust for robustly estimating the variance ratio factor for adjusting a mean-variance curve (or a set of curves).

estimateD0 and scaleMeanVarCurve for the ordinary (non-robust) routines for estimating number of prior degrees of freedom and variance ratio factor, respectively.

Examples

## Not run: 
## Private functions involved.

# For generating random FZ statistics with outliers. Note that the argument
# scaling controls how extreme outliers are.
rFZ <- function(n, var.ratio, m, d0, p_low, p_up, scaling) {
    z <- list()
    p_low <- p_low * 0.9
    p_up <- p_up * 0.9
    for (i in 1:length(n)) {
        x <- rf(n[i], m[i] - 1, d0)
        q_low <- qf(p_low, m[i] - 1, d0, lower.tail = TRUE)
        q_up <- qf(p_up, m[i] - 1, d0, lower.tail = FALSE)
        f <- x < q_low
        x[f] <- x[f] / runif(sum(f), 1, scaling)
        f <- x > q_up
        x[f] <- x[f] * runif(sum(f), 1, scaling)
        z[[i]] <- log(var.ratio[i]) + log(x)
    }
    z
}

# Settings.
n <- c(30000, 40000)
var.ratio <- c(1.2, 2.5)
m <- c(2, 3)
d0 <- 17
p_low <- 0.01
p_up <- 0.1

# Compare estimation results from ordinary (non-robust) and robust routines.
# Case 1: no outliers.
set.seed(100)
scaling <- 1
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)

# Case 2: moderate outliers.
scaling <- 3
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)

# Case 3: extreme outliers.
scaling <- 10
z <- rFZ(n, var.ratio, m, d0, p_low, p_up, scaling)
res1 <- estimateD0(z, m)
res1
scaleMeanVarCurve(z[1], m[1], res1)
scaleMeanVarCurve(z[2], m[2], res1)
res2 <- estimateD0Robust(z, m, p_low, p_up)
res2
scaleMeanVarCurveRobust(z[1], m[1], res2, p_low, p_up)
scaleMeanVarCurveRobust(z[2], m[2], res2, p_low, p_up)

## End(Not run)


[Package MAnorm2 version 1.2.2 Index]