diffTest {MAnorm2} | R Documentation |
Generic Differential Test
Description
diffTest
is a generic function used to perform a differential test
(or multiple differential tests) between two R objects, usually of the same
type. Described in this page is the method designed for comparing two
bioCond
objects. This method is typically used to call genomic
intervals with differentially represented ChIP-seq signals between two
biological conditions.
Usage
diffTest(x, y, ...)
## S3 method for class 'bioCond'
diffTest(x, y, min.var = 0, df.prior = NULL, ...)
Arguments
x , y |
|
... |
Arguments passed to specific methods or from other methods. |
min.var |
Lower bound of variances read from the mean-variance
curve. Any variance read from the curve less than |
df.prior |
Number of prior degrees of freedom associated with the
mean-variance curve.
Must be non-negative. Can be set to |
Details
This method for calling differential genomic intervals between two
bioCond
objects adopts the modeling strategy implemented in
limma
(see "References"), except that each interval has its own prior
variance, which is read from the mean-variance curve associated with the
bioCond
s. Technically, the final estimate of variance for an
individual interval is a weighted average between its prior and observed
variances, with the weights being proportional to their respective numbers
of degrees of freedom.
Two extreme values can be specified for the argument df.prior
(number of degrees of freedom associated with the prior variances),
representing two distinct cases: when it is set to 0
, the final
variance estimate for an individual interval is
simply deduced from the signal intensities observed in it, and the
statistical test reduces to the ordinary two-sample t-test; when it is set
to Inf
, the final variance estimate is simply read from the
mean-variance curve. Other values of df.prior
represent intermediate
cases. To be noted, the number of prior degrees of freedom is automatically
estimated for each mean-variance curve by a specifically designed
statistical method (see also fitMeanVarCurve
and
setMeanVarCurve
) and, by
default, diffTest
uses the estimation result to perform the
differential tests. It's highly not recommended to specify df.prior
explicitly when calling diffTest
, unless you know what you are really
doing. Besides, diffTest
won't adjust variance ratio factors of
the two bioCond
s being compared based on the specified number of
prior degrees of freedom (see estimatePriorDf
for a
description of variance ratio factor).
Note also that, if df.prior
is set to 0
, of the
two bioCond
objects being compared there must be at least one that
contains two or more samples. Otherwise, there is no way to measure the
variance associated with each interval,
and an error is raised.
Considering the practical significance of differential ChIP-seq signals, those genomic intervals not occupied by either of the conditions may be filtered out before selecting differential ones. Thus, the statistical power for detecting differential intervals could potentially be increased by re-adjusting p-values of the remaining intervals (see "Examples" below).
Value
This method returns an object of class
c("diffBioCond", "data.frame")
, recording the test results for
each genomic interval by each row. The data frame consists of the
following variables:
x.mean, y.mean
Mean signal intensities of the two conditions, respectively.
"x"
and"y"
in the variable names are replaced by the corresponding actual condition names.Mval
Difference in mean signal intensity between the two conditions. An
Mval
of1
indicates a twofold change in normalized read count.Mval.se
Standard error associated with the
Mval
.Mval.t
The ratio of
Mval
toMval.se
.pval
Two sided p-value for the statistical significance of this signal difference.
padj
P-value adjusted for multiple testing with the
"BH"
method (seep.adjust
), which controls false discovery rate.
Row names of the returned data frame inherit from those of
x$norm.signal
. Besides, an attribute named "Mval.se.df"
is
added to the returned object, which is a positive numeric giving the
total number of degrees of freedom associated with the standard errors.
References
Smyth, G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 2004. 3: p. Article3.
Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.
See Also
bioCond
for creating a bioCond
object;
fitMeanVarCurve
for fitting a mean-variance curve for
a set of bioCond
objects; setMeanVarCurve
for
setting the mean-variance curve of a set of bioCond
s;
estimatePriorDf
for estimating number of prior degrees of
freedom as well as adjusting variance ratio factors accordingly.
MAplot.diffBioCond
for creating an MA plot on results of
comparing two bioCond
objects; aovBioCond
for
comparing multiple bioCond
objects; varTestBioCond
for calling hypervariable and invariant intervals across ChIP-seq
samples contained in a bioCond
.
Examples
data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")
## Make a comparison between GM12891 and GM12892 cell lines.
# Perform MA normalization and construct bioConds to represent the two cell
# lines.
norm <- normalize(H3K27Ac, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Variations in ChIP-seq signals across biological replicates of a cell line
# are generally of a low level, and their relationship with the mean signal
# intensities is expected to be well modeled by the presumed parametric
# form.
conds <- fitMeanVarCurve(conds, method = "parametric", occupy.only = TRUE)
summary(conds[[1]])
plotMeanVarCurve(conds, subset = "occupied")
# Perform differential tests between the two cell lines.
res1 <- diffTest(conds[[1]], conds[[2]])
head(res1)
MAplot(res1, padj = 0.001)
abline(h = 0, lwd = 2, lty = 5, col = "green3")
## Make a comparison between GM12891 and GM12892 cell lines using only their
## first replicates.
# Perform MA normalization and construct bioConds to represent the two cell
# lines.
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
norm <- normalize(H3K27Ac, c(5, 7), c(10, 12),
common.peak.regions = autosome)
conds <- list(GM12891 = bioCond(norm[5], norm[10], name = "GM12891"),
GM12892 = bioCond(norm[7], norm[12], name = "GM12892"))
# Construct a "blind" bioCond that treats the two samples as replicates and
# fit a mean-variance curve accordingly. Only common peak regions of the two
# samples are considered to be occupied by the "blind" bioCond, and only
# these regions are used for fitting the mean-variance curve. This setting
# is for capturing underlying non-differential intervals as accurately as
# possible and avoiding over-estimation of prior variances (i.e., variances
# read from a mean-variance curve).
conds$blind <- bioCond(norm[c(5, 7)], norm[c(10, 12)], occupy.num = 2,
name = "blind")
conds <- fitMeanVarCurve(conds, method = "parametric",
occupy.only = TRUE, init.coef = c(0.1, 10))
summary(conds[[1]])
summary(conds[[2]])
summary(conds[[3]])
plotMeanVarCurve(conds, subset = "occupied")
# Perform differential tests between the two cell lines.
res2 <- diffTest(conds[[1]], conds[[2]])
head(res2)
MAplot(res2, pval = 0.01)
abline(h = 0, lwd = 2, lty = 5, col = "green3")
# Inspect only the test results of the genomic intervals that are occupied
# by at least one of the two bioConds having been compared. Note the
# globally increased statistical power.
res3 <- res2[conds[[1]]$occupancy | conds[[2]]$occupancy, ]
res3$padj <- p.adjust(res3$pval, method = "BH")
boxplot(list(all = res2$padj, occupied = res3$padj), ylab = "Adj. p-value")
## Examine the consistency of results between the two differential analyses.
# Theoretically, t-statistics resulting from the two differential analyses
# are not directly comparable to each other, since they have different
# numbers of degrees of freedom. Here we map these t-statistics to the
# standard normal distribution in such a manner that the resulting
# z-statistics correspond to the same p-values as do the original
# t-statistics.
z1 <- qnorm(res1$pval / 2)
z1[res1$Mval > 0] <- -z1[res1$Mval > 0]
z2 <- qnorm(res2$pval / 2)
z2[res2$Mval > 0] <- -z2[res2$Mval > 0]
# Check the correlation between z-statistics from the two differential
# analyses.
cor(z1, z2)
cor(z1, z2, method = "sp")
## Make a comparison between the male and female genders by treating each
## individual (i.e., cell line) as a replicate.
# First perform the MA normalization and construct bioConds to represent
# individuals.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Group individuals into bioConds based on their genders.
female <- cmbBioCond(conds[c(1, 3)], name = "female")
male <- cmbBioCond(conds[2], name = "male")
# The dependence of variance of ChIP-seq signal intensity across individuals
# on the mean signal intensity is not as regular as in the case for modeling
# biological replicates of cell lines. Better use the local regression to
# adaptively capture the mean-variance trend.
genders <- list(female = female, male = male)
genders <- fitMeanVarCurve(genders, method = "local", occupy.only = FALSE)
genders <- estimatePriorDf(genders, occupy.only = TRUE)
summary(genders$female)
summary(genders$male)
plotMeanVarCurve(genders, subset = "all")
# Perform differential tests between the two genders.
res <- diffTest(genders[[1]], genders[[2]])
head(res)
MAplot(res, pval = 0.01)
abline(h = 0, lwd = 2, lty = 5, col = "green3")
# Examine the distribution of p-values in Y chromosome.
hist(res$pval[H3K27Ac$chrom == "chrY"], col = "red",
main = "P-values in Y chromosome")