normBioCond {MAnorm2}R Documentation

Perform MA Normalization on a Set of bioCond Objects

Description

Given a list of bioCond objects, normBioCond performs an MA normalization on the signal intensities stored in them so that these objects are comparable to each other.

Usage

normBioCond(conds, baseline = NULL, subset = NULL, common.peak.regions = NULL)

Arguments

conds

A list of bioCond objects to be normalized.

baseline

A positive integer or character name indexing the baseline bioCond in conds. By default, the baseline is automatically selected by estimating the size factor of each bioCond (see normalize and estimateSizeFactors for details). Note that normBioCond treats the signal intensities contained in the supplied bioConds as in the scale of log2 read counts, which is consistent with the default behavior of normalize. Note also that baseline can be set to "pseudo-reference" as in normalize. And we recommend using this setting when the number of bioConds to be normalized is large (e.g., >5).

subset

An optional vector specifying the subset of intervals to be used for estimating size factors and selecting the baseline. Defaults to the intervals occupied by all the bioCond objects. Ignored if baseline is specified.

common.peak.regions

An optional logical vector specifying the intervals that could possibly be considered as common peak regions for each pair of bioCond objects. See also normalize.

Details

Technically, normBioCond treats each bioCond object as a ChIP-seq sample. It extracts the sample.mean and occupancy variables stored in each bioCond to represent its signal intensities and occupancy indicators, respectively. See bioCond for a description of the structure of a bioCond object.

Next, MA normalization on these bioCond objects is performed exactly as described in normalize. Specifically, we get a linear transformation for each bioCond object, which is subsequently applied to each of the ChIP-seq samples contained in it.

normBioCond is an effort to reduce potential biases introduced by the MA normalization process. The idea comes from the principle that the more similar two samples are to each other, the fewer biases are expected to introduce when normalizing them. With this function, instead of performing an overall normalization on all the ChIP-seq samples involved, you may choose to first perform a normalization within each biological condition, and then normalize between the resulting bioCond objects (see "Examples" below).

Value

A list of bioCond objects with normalized signal intensities, corresponding to the argument conds. To be noted, information about the mean-variance dependence stored in the original bioCond objects, if any, will be removed from the returned bioConds. You can re-fit a mean-variance curve for them by, for example, calling fitMeanVarCurve. Note also that the original structure matrices are retained for each bioCond in the returned list (see setWeight for a detailed description of structure matrix).

Besides, the following attributes are added to the list describing the MA normalization performed:

size.factor

Size factors of provided bioCond objects. Only present when baseline is not explicitly specified by the user.

baseline

Condition name of the bioCond object used as baseline or "pseudo-reference" if the baseline argument is specified so.

norm.coef

A data frame recording the MA normalization coefficients for each bioCond.

MA.cor

A real matrix recording the Pearson correlation coefficient between M & A values calculated from common peak regions of each pair of bioCond objects. The upper and lower triangle of the matrix are deduced from raw and normalized signal intensities, respectively. Note that M values are always calculated as the column bioCond minus the row one.

References

Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.

See Also

normalize for performing an MA normalization on ChIP-seq samples; bioCond for creating a bioCond object; normBioCondBySizeFactors for normalizing bioCond objects based on their size factors; cmbBioCond for combining a set of bioCond objects into a single one; MAplot.bioCond for creating an MA plot on two normalized bioCond objects; fitMeanVarCurve for modeling the mean-variance dependence across intervals in bioCond objects.

Examples

data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")

## Apply MA normalization first within each cell line, and then normalize
## across cell lines.

# Normalize samples separately for each cell line.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)

# Construct separately a bioCond object for each cell line, and perform MA
# normalization on the resulting bioConds. Genomic intervals in sex
# chromosomes are not allowed to be common ones, since the cell lines are
# from different genders.
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
              GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
              GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)

# Inspect the normalization effects.
attributes(conds)
plot(attr(conds, "MA.cor"), symbreaks = TRUE, margins = c(8, 8))
MAplot(conds[[1]], conds[[2]], main = "GM12890 vs. GM12891")
abline(h = 0, lwd = 2, lty = 5)


[Package MAnorm2 version 1.2.2 Index]