estimatePriorDf {MAnorm2} | R Documentation |
Assess the Goodness of Fit of Mean-Variance Curves
Description
Given a set of bioCond
objects of which each has been
associated with a mean-variance curve, estimatePriorDf
derives a
common number of
prior degrees of freedom assessing the overall goodness of fit of the
mean-variance curves and accordingly adjusts the variance ratio factor of
each of the bioCond
s.
Usage
estimatePriorDf(
conds,
occupy.only = TRUE,
return.d0 = FALSE,
no.rep.rv = NULL,
.call = TRUE
)
Arguments
conds |
A list of |
occupy.only |
A logical scalar. If it is |
return.d0 |
A logical scalar. If set to |
no.rep.rv |
A positive real specifying the variance ratio factor of
those |
.call |
Never care about this argument. |
Details
estimatePriorDf
borrows part of the modeling strategy implemented in
limma
(see "References"). For each bioCond
object, the
predicted variances from its mean-variance curve serve as the prior
variances associated with individual intervals.
The common number of prior degrees of freedom
of the supplied bioCond
s
quantifies the confidence we have on the associated mean-variance curves.
Intuitively, the closer the observed mean-variance points are
to the curves, the more prior degrees of freedom there will be.
See estimateD0
for technical details about the estimation of
number of prior degrees of freedom.
According to the estimated number of prior degrees of freedom,
estimatePriorDf
separately adjusts the variance ratio factor of each bioCond
.
Intrinsically, this process is to scale the mean-variance curve of each
bioCond
so that it passes the "middle" of the observed mean-variance
points. See scaleMeanVarCurve
for technical details of
scaling a mean-variance curve.
ChIP-seq signals located in non-occupied intervals result primarily from
background noise, and are therefore associated with less data regularity
than signals in occupied intervals. Involving non-occupied intervals in the
estimation process may result in an under-estimated number of prior degrees
of freedom. Thus, the recommended usage is to set occupy.only
to
TRUE
(i.e., the default).
In most cases, the estimation of number of prior degrees of freedom
is automatically
handled when fitting or setting a mean-variance curve, and you don't need to
call this function explicitly (see also fitMeanVarCurve
and
setMeanVarCurve
). See "Examples"
below for a practical application of this function. Note also that there is
a robust version of this function that uses Winsorized statistics to
protect the estimation procedure against potential outliers (see
estimatePriorDfRobust
for details).
Value
By default, estimatePriorDf
returns the argument list of
bioCond
objects,
with the estimated number of prior degrees of
freedom substituted for the "df.prior"
component of each of them.
Besides, their "ratio.var"
components have been adjusted
accordingly, and an attribute named "no.rep.rv"
is added to the
list if it's ever been used as the variance ratio factor of the
bioCond
s without replicate samples. A special case is that the
estimated number of prior degrees of freedom is 0. In this case,
estimatePriorDf
won't adjust existing variance ratio factors,
and you may want to use setPriorDfVarRatio
to
explicitly specify variance ratio factors.
If return.d0
is set to TRUE
, estimatePriorDf
simply
returns the estimated number of prior degrees of freedom.
References
Smyth, G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 2004. 3: p. Article3.
Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.
See Also
bioCond
for creating a bioCond
object;
fitMeanVarCurve
for fitting a mean-variance curve and
using a fit.info
field to characterize it;
estimatePriorDfRobust
for a robust version of
estimatePriorDf
;
setPriorDf
for setting the number of
prior degrees of freedom and accordingly
adjusting the variance ratio factors of a set of bioCond
s;
diffTest
for calling differential
intervals between two bioCond
objects.
Examples
data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")
## Fit a mean-variance curve treating each gender as a biological condition,
## and each individual (i.e., cell line) a replicate.
# First perform the MA normalization and construct bioConds to represent
# individuals.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Group individuals into bioConds based on their genders.
female <- cmbBioCond(conds[c(1, 3)], name = "female")
male <- cmbBioCond(conds[2], name = "male")
# The dependence of variance of ChIP-seq signal intensity across individuals
# on the mean signal intensity is typically not as regular as could be well
# modeled by an explicit parametric form. Better use the local regression to
# adaptively capture the mean-variance trend.
genders <- list(female = female, male = male)
genders1 <- fitMeanVarCurve(genders, method = "local", occupy.only = TRUE)
genders2 <- fitMeanVarCurve(genders, method = "local", occupy.only = FALSE)
# Suppose the local regression is performed using only the occupied genomic
# intervals as input. Good chances are that the extrapolation algorithm
# implemented in the regression method will produce over-estimated variances
# for the non-occupied intervals.
plotMeanVarCurve(genders1, subset = "all")
plotMeanVarCurve(genders2, subset = "all")
plotMeanVarCurve(genders1, subset = "non-occupied")
plotMeanVarCurve(genders2, subset = "non-occupied")
# On the other hand, applying the local regression on all genomic intervals
# may considerably reduce the estimated number of prior degrees of freedom
# associated with the fitted mean-variance curve, as ChIP-seq signals in the
# non-occupied intervals are generally of less data regularity compared with
# those in the occupied intervals.
summary(genders1$female)
summary(genders2$female)
# To split the difference, fit the mean-variance curve on all genomic
# intervals and re-estimate the number of prior degrees of freedom using
# only the occupied intervals, which is also the most recommended strategy
# in practice.
genders3 <- estimatePriorDf(genders2, occupy.only = TRUE)
plotMeanVarCurve(genders3, subset = "all")
plotMeanVarCurve(genders3, subset = "non-occupied")
summary(genders3$female)