varTestBioCond {MAnorm2} | R Documentation |
Call Hypervariable and Invariant Intervals for a bioCond
Description
Given a bioCond
object with which a mean-variance curve is
associated (see fitMeanVarCurve
), varTestBioCond
tests
for each genomic interval if the observed variation of its signal intensity
across ChIP-seq samples in the bioCond
is significantly greater or
less than is implied by the curve. This function is typically used
in combination with estParamHyperChIP
to call hypervariable
and invariant intervals in a bioCond
(see also "Examples").
Usage
varTestBioCond(cond, min.var = 0, df.prior = NULL)
Arguments
cond |
A |
min.var |
Lower bound of variances read from the mean-variance
curve. Any variance read from the curve less than |
df.prior |
Number of prior degrees of freedom associated with the
mean-variance curve. Must be positive.
Can be set to |
Details
varTestBioCond
adopts the modeling strategy implemented in
limma
(see "References"),
except that each genomic interval has its own
prior variance, which is read from the mean-variance curve associated with
the bioCond
object. The argument df.prior
could be
used to specify the common number of degrees of freedom of all the prior
variances, which also effectively assesses the overall goodness of fit of
the mean-variance curve. Technically, varTestBioCond
uses
the ratio of the observed variance of each interval to its prior variance as
key statistic, which under the null hypothesis follows an F
distribution, with
its two numbers of degrees of freedom being those of the two variances,
respectively.
(Hence the statistic follows a scaled chi-squared distribution when the
prior df is Inf
.) To be noted, the prior df can be empirically
estimated for each
mean-variance curve by specifically designed statistical methods
(see also fitMeanVarCurve
, setMeanVarCurve
,
estimatePriorDf
, and estParamHyperChIP
)
and, by default, the function uses the
estimation result to perform the tests. It's highly not recommended to
specify df.prior
explicitly when calling varTestBioCond
,
unless you know what you are really doing. Besides, varTestBioCond
won't adjust the variance ratio factor of the provided bioCond
based
on the specified prior df (see estimatePriorDf
for a description of variance ratio factor).
Any bioCond
object passed to varTestBioCond
must contain at
least two ChIP-seq samples; the observed variances of individual
genomic intervals cannot be calculated otherwise.
Besides, a mean-variance curve must be associated with the bioCond
for deducing the prior variances before
varTestBioCond
could work. Notably, when fitting a mean-variance
curve for a bioCond
object to be passed to varTestBioCond
,
it's recommended to pass it alone to fitMeanVarCurve
(not
involving other bioCond
objects). Also, if you have set
occupy.only
to TRUE
when calling
fitMeanVarCurve
, you should accordingly inspect only the test
results of those genomic intervals that are occupied by the bioCond
,
and should re-adjust
p-values within this set of intervals (see "Examples" below).
varTestBioCond
can also be used to call hypervariable and invariant
intervals across biological conditions, by first combining multiple
bioCond
objects into a single one (see "Examples" below). Note that
ChIP-seq samples in those bioCond
s to be combined must be first
normalized to the same level (see cmbBioCond
for details).
Value
This function returns an object of class
c("varTestBioCond", "data.frame")
, recording the test results for
each genomic interval by each row. The data frame consists of the
following variables:
observed.mean
Sample mean of the observed signal intensities.
observed.var
Sample variance of the observed signal intensities.
prior.var
Prior variance corresponding to the observed mean signal intensity.
fold.change
Ratio of
observed.var
toprior.var
.pval
Two sided p-value for the statistical significance of this fold change.
padj
P-value adjusted for multiple testing with the
"BH"
method (seep.adjust
), which controls false discovery rate.
Row names of the returned data frame inherit from those of
cond$norm.signal
. Besides, several attributes are added to the
returned object:
bioCond.name
Name of the
bioCond
object.mean.var.curve
A function representing the mean-variance curve. It accepts a vector of mean signal intensities and returns the corresponding prior variances. Note that this function has incorporated variance ratio factor of the
bioCond
and themin.var
argument.df
A length-2 vector giving the numbers of degrees of freedom of the observed and prior variances.
References
Smyth, G.K., Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol, 2004. 3: p. Article3.
See Also
bioCond
for creating a bioCond
object;
fitMeanVarCurve
for fitting a mean-variance curve for
a set of bioCond
objects; setMeanVarCurve
for
setting the mean-variance curve of a set of bioCond
s;
estimatePriorDf
for estimating number of prior degrees of
freedom as well as adjusting variance ratio factors accordingly;
estParamHyperChIP
for applying the parameter estimation
framework of HyperChIP;
cmbBioCond
for combining multiple bioCond
s
into a single one.
plot.varTestBioCond
for creating a plot to demonstrate a
varTestBioCond
object; diffTest
for calling differential intervals between two bioCond
objects;
aovBioCond
for calling differential intervals across
multiple bioCond
s.
Examples
library(scales)
data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")
## Call hypervariable and invariant genomic intervals across biological
## replicates of the GM12891 cell line.
# Perform MA normalization and construct a bioCond to represent GM12891.
norm <- normalize(H3K27Ac, 5:6, 10:11)
GM12891 <- bioCond(norm[5:6], norm[10:11], name = "GM12891")
# Fit a mean-variance curve for GM12891 using the parametric method.
GM12891 <- fitMeanVarCurve(list(GM12891), method = "parametric",
occupy.only = TRUE)[[1]]
summary(GM12891)
plotMeanVarCurve(list(GM12891), subset = "occupied")
# Assess the observed variances of ChIP-seq signal intensities in GM12891.
res <- varTestBioCond(GM12891)
head(res)
# Inspect only the test results of occupied genomic intervals.
res <- res[GM12891$occupancy, ]
res$padj <- p.adjust(res$pval, method = "BH")
plot(res, padj = 0.2, col = alpha(c("black", "red"), c(0.04, 0.5)))
## Call hypervariable and invariant genomic intervals across cell lines.
# Perform MA normalization and construct bioConds to represent cell lines.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
# Normalize the cell lines.
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Combine the cell lines into a single bioCond and use local regression to
# adaptively capture the mean-variance trend. Only genomic intervals that
# are occupied by each of the cell lines are considered to be occupied by
# the combined bioCond, which is for avoiding over-estimation of the prior
# variances.
LCLs <- cmbBioCond(conds, occupy.num = length(conds),
name = "lymphoblastoid_cell_lines")
LCLs <- fitMeanVarCurve(list(LCLs), method = "local",
occupy.only = FALSE)[[1]]
LCLs <- estimatePriorDf(list(LCLs), occupy.only = TRUE)[[1]]
summary(LCLs)
plotMeanVarCurve(list(LCLs), subset = "all")
# Assess the observed variances of ChIP-seq signal intensities across these
# cell lines.
res <- varTestBioCond(LCLs)
head(res)
plot(res, pval = 0.01, col = alpha(c("black", "red"), c(0.04, 0.5)))
# Non-occupied intervals are occupied by some of the cell lines but not all
# of them. These intervals tend to be more variable across the cell lines
# and more significant in the tests than occupied intervals.
f <- !(LCLs$occupancy)
wilcox.test(res$fold.change[f], res$fold.change[!f],
alternative = "greater")
wilcox.test(res$pval[f], res$pval[!f], alternative = "less")
# Intervals in Y chromosome tend to be more variable across the cell lines
# and more significant in the tests than the other intervals, since the cell
# lines are of different genders.
f <- H3K27Ac$chrom == "chrY"
wilcox.test(res$fold.change[f], res$fold.change[!f],
alternative = "greater")
wilcox.test(res$pval[f], res$pval[!f], alternative = "less")
# Make a comparison with HyperChIP.
LCLs2 <- estParamHyperChIP(LCLs, occupy.only = FALSE, prob = 0.1)
summary(LCLs)
summary(LCLs2)
res2 <- varTestBioCond(LCLs2)
hist(res$pval, breaks = 100, col = "red")
hist(res2$pval, breaks = 100, col = "red")