aovBioCond {MAnorm2} | R Documentation |
Perform a Moderated Analysis of Variance on bioCond
Objects
Description
Given a set of bioCond
objects with which a mean-variance
curve is associated, aovBioCond
performs a one-way ANOVA-like
analysis on them. More specifically, it separately tests for each genomic
interval the null hypothesis that mean signal intensity in the interval
remains invariant across all the biological conditions.
Usage
aovBioCond(conds, min.var = 0, df.prior = NULL)
Arguments
conds |
A list of |
min.var |
Lower bound of variances read from the mean-variance
curve. Any variance read from the curve less than |
df.prior |
Number of prior degrees of freedom associated with the
mean-variance curve. Must be non-negative.
Can be set to |
Details
aovBioCond
adopts the modeling strategy implemented in limma
(see "References"), except that each interval has its own prior variance,
which is read from the mean-variance curve associated with the
bioCond
objects. Technically, this function calculates a
moderated F statistic for each genomic interval to test the null
hypothesis. The moderated F statistic is simply the F
statistic from an ordinary one-way
ANOVA with its denominator (i.e., sample variance) replaced
by posterior variance, which is defined to be a weighted average of sample
and prior variances, with the weights being proportional to their respective
numbers of degrees of freedom.
This method of incorporating the prior information
increases the statistical power for the tests.
Two extreme values can be specified for the argument df.prior
(number of degrees of freedom associated with the prior variances),
representing two distinct
cases: when it's set to 0
, the prior information won't be used at
all, and the tests reduce to ordinary F tests in one-way ANOVA; when it's
set to Inf
, the denominators of moderated F statistics are simply the
prior variances, and these F statistics reduce to following a scaled
chi-squared distribution. Other values of df.prior
represent
intermediate cases. To be noted, the number of prior degrees of freedom is
automatically estimated for each
mean-variance curve by a specifically designed statistical method
(see also fitMeanVarCurve
and
setMeanVarCurve
) and, by default, aovBioCond
uses the
estimation result to perform the tests. It's highly not recommended
to specify df.prior
explicitly when calling aovBioCond
, unless
you know what you are really doing. Besides, aovBioCond
won't adjust
variance ratio factors of the provided bioCond
s based on the
specified number of prior degrees of freedom (see
estimatePriorDf
for a description of variance ratio factor).
Note also that, if df.prior
is set to 0
, of the
bioCond
objects in conds
there must be at least one that
contains two or more ChIP-seq
samples. Otherwise, there is no way to measure the variance associated with
each interval, and an error is raised.
Considering the practical significance of this analysis, which is to select
genomic intervals with differential ChIP-seq signals between at least one
pair of the biological conditions, those intervals not occupied by any of
the bioCond
objects in conds
may be filtered out before making the selections.
Thus, the statistical power of the tests could potentially be improved by
re-adjusting p-values of the remaining intervals.
Value
aovBioCond
returns an object of class
c("aovBioCond", "data.frame")
, recording the test results for
each genomic interval by each row. The data frame consists of the
following variables:
conds.mean
Mean signal intensity at the interval across biological conditions.
between.ms
Between-condition mean of squares as from an ordinary one-way ANOVA.
within.ms
Within-condition mean of squares as from an ordinary one-way ANOVA.
prior.var
Prior variance deduced by reading from the mean-variance curve associated with the
bioCond
objects inconds
.posterior.var
A weighted average of
within.ms
andprior.var
, with the weights being proportional to their respective numbers of degrees of freedom.mod.f
Moderated F statistic, which is the ratio of
between.ms
toposterior.var
.pval
P-value for the statistical significance of this moderated F statistic.
padj
P-value adjusted for multiple testing with the
"BH"
method (seep.adjust
), which controls false discovery rate.
Row names of the returned data frame inherit from those of
conds[[1]]$norm.signal
. Besides, several attributes are added to
the returned object:
bioCond.names
Names of the
bioCond
objects inconds
.mean.var.curve
A function representing the mean-variance curve. It accepts a vector of mean signal intensities and returns the corresponding prior variances. Note that this function has incorporated the
min.var
argument.df
A length-4 vector giving the numbers of degrees of freedom of
between.ms
,within.ms
,prior.var
andposterior.var
.
References
Smyth, G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 2004. 3: p. Article3.
Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.
See Also
bioCond
for creating a bioCond
object;
fitMeanVarCurve
for fitting a mean-variance curve for
a set of bioCond
objects; setMeanVarCurve
for
setting the mean-variance curve of a set of bioCond
s;
estimatePriorDf
for estimating number of prior degrees of
freedom as well as adjusting variance ratio factors accordingly.
plot.aovBioCond
for creating a plot to demonstrate an
aovBioCond
object; diffTest
for
calling differential intervals between two bioCond
objects;
varTestBioCond
for calling hypervariable and invariant
intervals across ChIP-seq samples contained in a bioCond
.
Examples
data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")
## Call differential genomic intervals among GM12890, GM12891 and GM12892
## cell lines.
# Perform MA normalization and construct bioConds to represent the cell
# lines.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Variations in ChIP-seq signals across biological replicates of a cell line
# are generally of a low level, and their relationship with the mean signal
# intensities is expected to be well modeled by the presumed parametric
# form.
conds <- fitMeanVarCurve(conds, method = "parametric", occupy.only = TRUE)
summary(conds[[1]])
plotMeanVarCurve(conds, subset = "occupied")
# Perform a moderated ANOVA on these cell lines.
res <- aovBioCond(conds)
head(res)
plot(res, padj = 1e-6)