vstBioCond {MAnorm2} | R Documentation |
Apply a Variance-Stabilizing Transformation to a bioCond
Description
Given a bioCond
object with which a mean-variance curve is
associated, vstBioCond
deduces a variance-stabilizing transformation
(VST) based on the curve, and applies it to the signal intensities of
samples contained in the bioCond
, so that variances of individual
genomic intervals are comparable between each other.
Usage
vstBioCond(x, min.var = 0, integrate.func = integrate, ...)
Arguments
x |
A |
min.var |
Lower bound of variances read from the mean-variance
curve. Any variance read from the curve less than |
integrate.func |
A function for quadrature of functions of one
variable. Any function passed to this argument must mimic the behavior
of |
... |
Additional arguments to |
Details
vstBioCond
deduces the VST by applying the standard delta method to
the mean-variance curve associated with the bioCond
object. To
be noted, applying the VST to the bioCond
retains its structure
matrices. More specifically, the transformed signal intensities of each
genomic interval will have a covariance matrix
approximately proportional to its
structure matrix in the bioCond
. See setWeight
for a
detailed description of structure matrix.
Technically, applying the VST requires the quadrature of a one-variable
function, which in vstBioCond
is achieved numerically. One can
specify the numerical integration routine used by vstBioCond
via the
argument integrate.func
, as long as the provided function mimics the
behavior of integrate
. Specifically, supposing the
first three arguments to the function are f
, a
and b
,
then ret$value
should be the integral of f
from a
to
b
, where ret
is the object returned from the function. See
integrate
for details.
One of the applications of applying a VST to a bioCond
is for
clustering the samples contained in it. Since variances of transformed
signals are comparable across genomic intervals,
performing a clustering analysis
on the transformed data is expected to give more reliable results than those
from the original signals. Notably, to apply a clustering analysis to the
VSTed signals, one typically passes the returned object from
vstBioCond
to distBioCond
setting the method
argument to "none"
, by which you can get a dist
object recording the distance between each pair of samples of the
bioCond
. This procedure is specifically designed to handle VSTed
bioCond
s and has considered the possibility that different genomic
intervals may be associated with different structure matrices (see
distBioCond
for details). The resulting
dist
object can then be passed to
hclust
to perform a hierarchical clustering (see
also "Examples").
From this perspective, vstBioCond
could also be used to cluster a set
of bioCond
objects, by first combining them into a single
bioCond
and fitting a mean-variance curve for it (see "Examples"
below and also cmbBioCond
).
Value
vstBioCond
returns a bioCond
object with an
extra attribute named "vst.func"
, which represents the VST
applied to x
. Signal intensities contained in the returned
bioCond
are obtained by applying the VST to the signal
intensities in x
.
The returned bioCond
has the same biological condition name and
occupancy states of genomic intervals as x
. Besides, the
structure matrix of each interval
in the returned bioCond
inherits
from x
as well, since performing the designed VST approximately
retains the original structure matrices (see "Details").
The vst.func
attribute is a function that accepts a vector of
signal intensities and returns the VSTed signals. To be noted,
vst.func
has been scaled so that the resulting transformed
signals in the returned bioCond
have a similar numerical range
and variation level to the signal intensities in x
.
More specifically, the sample.mean
and sample.var
fields
of the returned bioCond
have the same arithmetic mean and
geometric mean as x$sample.mean
and x$sample.var
,
respectively. See bioCond
for a detailed description
of these fields.
Note also that, in principle, applying the vst.func
to any
bioCond
object that is associated with the same mean-variance
curve as is x
(i.e., has the same mvcID
as that of
x
; see also fitMeanVarCurve
) effectively stabilizes
the variances of its signal intensities across genomic intervals.
For future reference, the vst.func
itself has an
attribute named "mvcID"
recording the mvcID
of x
.
See Also
bioCond
for creating a bioCond
object;
fitMeanVarCurve
for fitting a mean-variance curve;
integrate
for a numerical integration routine;
setWeight
for a detailed description of structure matrix;
cmbBioCond
for combining a set of bioCond
objects
into a single one; distBioCond
for robustly measuring the
distances between samples in a bioCond
;
hclust
for performing a hierarchical clustering on
a dist
object.
Examples
data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")
## Cluster a set of ChIP-seq samples from different cell lines (i.e.,
## individuals).
# Perform MA normalization and construct a bioCond.
norm <- normalize(H3K27Ac, 4:8, 9:13)
cond <- bioCond(norm[4:8], norm[9:13], name = "all")
# Fit a mean-variance curve.
cond <- fitMeanVarCurve(list(cond), method = "local",
occupy.only = FALSE)[[1]]
plotMeanVarCurve(list(cond), subset = "all")
# Apply a variance-stabilizing transformation and associate a constant
# function with the resulting bioCond as its mean-variance curve.
vst_cond <- vstBioCond(cond)
vst_cond <- setMeanVarCurve(list(vst_cond), function(x)
rep_len(1, length(x)), occupy.only = FALSE,
method = "constant prior")[[1]]
plotMeanVarCurve(list(vst_cond), subset = "all")
# Measure the distance between each pair of samples and accordingly perform
# a hierarchical clustering. Note that biological replicates of each cell
# line are clustered together.
d1 <- distBioCond(vst_cond, method = "none")
plot(hclust(d1, method = "average"), hang = -1)
# Measure the distances using only hypervariable genomic intervals. Note the
# change of scale of the distances.
res <- varTestBioCond(vst_cond)
f <- res$fold.change > 1 & res$pval < 0.05
d2 <- distBioCond(vst_cond, subset = f, method = "none")
plot(hclust(d2, method = "average"), hang = -1)
## Cluster a set of individuals.
# Perform MA normalization and construct bioConds to represent individuals.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
conds <- normBioCond(conds)
# Group the individuals into a single bioCond and fit a mean-variance curve
# for it.
cond <- cmbBioCond(conds, name = "all")
cond <- fitMeanVarCurve(list(cond), method = "local",
occupy.only = FALSE)[[1]]
plotMeanVarCurve(list(cond), subset = "all")
# Apply a variance-stabilizing transformation and associate a constant
# function with the resulting bioCond as its mean-variance curve.
vst_cond <- vstBioCond(cond)
vst_cond <- setMeanVarCurve(list(vst_cond), function(x)
rep_len(1, length(x)), occupy.only = FALSE,
method = "constant prior")[[1]]
plotMeanVarCurve(list(vst_cond), subset = "all")
# Measure the distance between each pair of individuals and accordingly
# perform a hierarchical clustering. Note that GM12891 and GM12892 are
# actually a couple and they are clustered together.
d1 <- distBioCond(vst_cond, method = "none")
plot(hclust(d1, method = "average"), hang = -1)
# Measure the distances using only hypervariable genomic intervals. Note the
# change of scale of the distances.
res <- varTestBioCond(vst_cond)
f <- res$fold.change > 1 & res$pval < 0.05
d2 <- distBioCond(vst_cond, subset = f, method = "none")
plot(hclust(d2, method = "average"), hang = -1)
## Perform differential analysis on bioConds that have gone through a
## variance-stabilizing transformation.
# Perform MA normalization and construct bioConds to represent cell lines
# (i.e., individuals).
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)
# Fit a mean-variance curve.
conds <- fitMeanVarCurve(conds, method = "parametric", occupy.only = TRUE)
plotMeanVarCurve(conds, subset = "occupied")
# Apply a variance-stabilizing transformation.
vst_conds <- list(GM12890 = vstBioCond(conds$GM12890))
vst.func <- attr(vst_conds$GM12890, "vst.func")
temp <- matrix(vst.func(as.numeric(conds$GM12891$norm.signal)),
nrow = nrow(norm))
vst_conds$GM12891 <- bioCond(temp, norm[10:11], name = "GM12891")
temp <- matrix(vst.func(as.numeric(conds$GM12892$norm.signal)),
nrow = nrow(norm))
vst_conds$GM12892 <- bioCond(temp, norm[12:13], name = "GM12892")
# Associate a constant function with the resulting bioConds as their
# mean-variance curve.
vst_conds <- setMeanVarCurve(vst_conds, function(x) rep_len(1, length(x)),
occupy.only = TRUE, method = "constant prior")
plotMeanVarCurve(vst_conds, subset = "occupied")
# Make a comparison between GM12891 and GM12892.
res1 <- diffTest(conds$GM12891, conds$GM12892)
res2 <- diffTest(vst_conds$GM12891, vst_conds$GM12892)
# Examine the consistency of analysis results between using ordinary and
# VSTed signal intensities. Here we map p-values together with observed
# directions of signal changes to the standard normal distribution.
z1 <- qnorm(res1$pval / 2)
z1[res1$Mval > 0] <- -z1[res1$Mval > 0]
z2 <- qnorm(res2$pval / 2)
z2[res2$Mval > 0] <- -z2[res2$Mval > 0]
plot(z1, z2, xlab = "Ordinary", ylab = "VSTed")
abline(a = 0, b = 1, lwd = 2, lty = 5, col = "red")
cor(z1, z2)
cor(z1, z2, method = "sp")
# Simultaneously compare GM12890, GM12891 and GM12892 cell lines.
res1 <- aovBioCond(conds)
res2 <- aovBioCond(vst_conds)
# Examine the consistency of analysis results between using ordinary and
# VSTed signal intensities by mapping p-values to the standard normal
# distribution.
z1 <- qnorm(res1$pval, lower.tail = FALSE)
z1[z1 == Inf] <- 39
z2 <- qnorm(res2$pval, lower.tail = FALSE)
z2[z2 == Inf] <- 39
plot(z1, z2, xlab = "Ordinary", ylab = "VSTed")
abline(a = 0, b = 1, lwd = 2, lty = 5, col = "red")
cor(z1, z2)
cor(z1, z2, method = "sp")