| vstBioCond {MAnorm2} | R Documentation |
Apply a Variance-Stabilizing Transformation to a bioCond
Description
Given a bioCond object with which a mean-variance curve is
associated, vstBioCond deduces a variance-stabilizing transformation
(VST) based on the curve, and applies it to the signal intensities of
samples contained in the bioCond, so that variances of individual
genomic intervals are comparable between each other.
Usage
vstBioCond(x, min.var = 0, integrate.func = integrate, ...)
Arguments
x |
A |
min.var |
Lower bound of variances read from the mean-variance
curve. Any variance read from the curve less than |
integrate.func |
A function for quadrature of functions of one
variable. Any function passed to this argument must mimic the behavior
of |
... |
Additional arguments to |
Details
vstBioCond deduces the VST by applying the standard delta method to
the mean-variance curve associated with the bioCond object. To
be noted, applying the VST to the bioCond retains its structure
matrices. More specifically, the transformed signal intensities of each
genomic interval will have a covariance matrix
approximately proportional to its
structure matrix in the bioCond. See setWeight for a
detailed description of structure matrix.
Technically, applying the VST requires the quadrature of a one-variable
function, which in vstBioCond is achieved numerically. One can
specify the numerical integration routine used by vstBioCond via the
argument integrate.func, as long as the provided function mimics the
behavior of integrate. Specifically, supposing the
first three arguments to the function are f, a and b,
then ret$value should be the integral of f from a to
b, where ret is the object returned from the function. See
integrate for details.
One of the applications of applying a VST to a bioCond is for
clustering the samples contained in it. Since variances of transformed
signals are comparable across genomic intervals,
performing a clustering analysis
on the transformed data is expected to give more reliable results than those
from the original signals. Notably, to apply a clustering analysis to the
VSTed signals, one typically passes the returned object from
vstBioCond to distBioCond setting the method
argument to "none", by which you can get a dist
object recording the distance between each pair of samples of the
bioCond. This procedure is specifically designed to handle VSTed
bioConds and has considered the possibility that different genomic
intervals may be associated with different structure matrices (see
distBioCond for details). The resulting
dist object can then be passed to
hclust to perform a hierarchical clustering (see
also "Examples").
From this perspective, vstBioCond could also be used to cluster a set
of bioCond objects, by first combining them into a single
bioCond and fitting a mean-variance curve for it (see "Examples"
below and also cmbBioCond).
Value
vstBioCond returns a bioCond object with an
extra attribute named "vst.func", which represents the VST
applied to x. Signal intensities contained in the returned
bioCond are obtained by applying the VST to the signal
intensities in x.
The returned bioCond has the same biological condition name and
occupancy states of genomic intervals as x. Besides, the
structure matrix of each interval
in the returned bioCond inherits
from x as well, since performing the designed VST approximately
retains the original structure matrices (see "Details").
The vst.func attribute is a function that accepts a vector of
signal intensities and returns the VSTed signals. To be noted,
vst.func has been scaled so that the resulting transformed
signals in the returned bioCond have a similar numerical range
and variation level to the signal intensities in x.
More specifically, the sample.mean and sample.var fields
of the returned bioCond have the same arithmetic mean and
geometric mean as x$sample.mean and x$sample.var,
respectively. See bioCond for a detailed description
of these fields.
Note also that, in principle, applying the vst.func to any
bioCond object that is associated with the same mean-variance
curve as is x (i.e., has the same mvcID as that of
x; see also fitMeanVarCurve) effectively stabilizes
the variances of its signal intensities across genomic intervals.
For future reference, the vst.func itself has an
attribute named "mvcID" recording the mvcID of x.
See Also
bioCond for creating a bioCond object;
fitMeanVarCurve for fitting a mean-variance curve;
integrate for a numerical integration routine;
setWeight for a detailed description of structure matrix;
cmbBioCond for combining a set of bioCond objects
into a single one; distBioCond for robustly measuring the
distances between samples in a bioCond;
hclust for performing a hierarchical clustering on
a dist object.
Examples
data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")
## Cluster a set of ChIP-seq samples from different cell lines (i.e.,
## individuals).
# Perform MA normalization and construct a bioCond.
norm <- normalize(H3K27Ac, 4:8, 9:13)
cond <- bioCond(norm[4:8], norm[9:13], name = "all")
# Fit a mean-variance curve.
cond <- fitMeanVarCurve(list(cond), method = "local",
occupy.only = FALSE)[[1]]
plotMeanVarCurve(list(cond), subset = "all")
# Apply a variance-stabilizing transformation and associate a constant
# function with the resulting bioCond as its mean-variance curve.
vst_cond <- vstBioCond(cond)
vst_cond <- setMeanVarCurve(list(vst_cond), function(x)
rep_len(1, length(x)), occupy.only = FALSE,
method = "constant prior")[[1]]
plotMeanVarCurve(list(vst_cond), subset = "all")
# Measure the distance between each pair of samples and accordingly perform
# a hierarchical clustering. Note that biological replicates of each cell
# line are clustered together.
d1 <- distBioCond(vst_cond, method = "none")
plot(hclust(d1, method = "average"), hang = -1)
# Measure the distances using only hypervariable genomic intervals. Note the
# change of scale of the distances.
res <- varTestBioCond(vst_cond)
f <- res$fold.change > 1 & res$pval < 0.05
d2 <- distBioCond(vst_cond, subset = f, method = "none")
plot(hclust(d2, method = "average"), hang = -1)
## Cluster a set of individuals.
# Perform MA normalization and construct bioConds to represent individuals.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
conds <- normBioCond(conds)
# Group the individuals into a single bioCond and fit a mean-variance curve
# for it.
cond <- cmbBioCond(conds, name = "all")
cond <- fitMeanVarCurve(list(cond), method = "local",
occupy.only = FALSE)[[1]]
plotMeanVarCurve(list(cond), subset = "all")
# Apply a variance-stabilizing transformation and associate a constant
# function with the resulting bioCond as its mean-variance curve.
vst_cond <- vstBioCond(cond)
vst_cond <- setMeanVarCurve(list(vst_cond), function(x)
rep_len(1, length(x)), occupy.only = FALSE,
method = "constant prior")[[1]]
plotMeanVarCurve(list(vst_cond), subset = "all")
# Measure the distance between each pair of individuals and accordingly
# perform a hierarchical clustering. Note that GM12891 and GM12892 are
# actually a couple and they are clustered together.
d1 <- distBioCond(vst_cond, method = "none")
plot(hclust(d1, method = "average"), hang = -1)
# Measure the distances using only hypervariable genomic intervals. Note the
# change of scale of the distances.
res <- varTestBioCond(vst_cond)
f <- res$fold.change > 1 & res$pval < 0.05
d2 <- distBioCond(vst_cond, subset = f, method = "none")
plot(hclust(d2, method = "average"), hang = -1)
## Perform differential analysis on bioConds that have gone through a
## variance-stabilizing transformation.
# Perform MA normalization and construct bioConds to represent cell lines
# (i.e., individuals).
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Fit a mean-variance curve.
conds <- fitMeanVarCurve(conds, method = "parametric", occupy.only = TRUE)
plotMeanVarCurve(conds, subset = "occupied")
# Apply a variance-stabilizing transformation.
vst_conds <- list(GM12890 = vstBioCond(conds$GM12890))
vst.func <- attr(vst_conds$GM12890, "vst.func")
temp <- matrix(vst.func(as.numeric(conds$GM12891$norm.signal)),
nrow = nrow(norm))
vst_conds$GM12891 <- bioCond(temp, norm[10:11], name = "GM12891")
temp <- matrix(vst.func(as.numeric(conds$GM12892$norm.signal)),
nrow = nrow(norm))
vst_conds$GM12892 <- bioCond(temp, norm[12:13], name = "GM12892")
# Associate a constant function with the resulting bioConds as their
# mean-variance curve.
vst_conds <- setMeanVarCurve(vst_conds, function(x) rep_len(1, length(x)),
occupy.only = TRUE, method = "constant prior")
plotMeanVarCurve(vst_conds, subset = "occupied")
# Make a comparison between GM12891 and GM12892.
res1 <- diffTest(conds$GM12891, conds$GM12892)
res2 <- diffTest(vst_conds$GM12891, vst_conds$GM12892)
# Examine the consistency of analysis results between using ordinary and
# VSTed signal intensities. Here we map p-values together with observed
# directions of signal changes to the standard normal distribution.
z1 <- qnorm(res1$pval / 2)
z1[res1$Mval > 0] <- -z1[res1$Mval > 0]
z2 <- qnorm(res2$pval / 2)
z2[res2$Mval > 0] <- -z2[res2$Mval > 0]
plot(z1, z2, xlab = "Ordinary", ylab = "VSTed")
abline(a = 0, b = 1, lwd = 2, lty = 5, col = "red")
cor(z1, z2)
cor(z1, z2, method = "sp")
# Simultaneously compare GM12890, GM12891 and GM12892 cell lines.
res1 <- aovBioCond(conds)
res2 <- aovBioCond(vst_conds)
# Examine the consistency of analysis results between using ordinary and
# VSTed signal intensities by mapping p-values to the standard normal
# distribution.
z1 <- qnorm(res1$pval, lower.tail = FALSE)
z1[z1 == Inf] <- 39
z2 <- qnorm(res2$pval, lower.tail = FALSE)
z2[z2 == Inf] <- 39
plot(z1, z2, xlab = "Ordinary", ylab = "VSTed")
abline(a = 0, b = 1, lwd = 2, lty = 5, col = "red")
cor(z1, z2)
cor(z1, z2, method = "sp")