R: Perform MA Normalization on a Set of 'bioCond' Objects

normBioCond {MAnorm2}

R Documentation

Perform MA Normalization on a Set of `bioCond` Objects

Description

Given a list of bioCond objects, normBioCond performs an MA normalization on the signal intensities stored in them so that these objects are comparable to each other.

Usage

normBioCond(conds, baseline = NULL, subset = NULL, common.peak.regions = NULL)

Arguments

`conds`	A list of `bioCond` objects to be normalized.
`baseline`	A positive integer or character name indexing the baseline `bioCond` in `conds`. By default, the baseline is automatically selected by estimating the size factor of each `bioCond` (see `normalize` and `estimateSizeFactors` for details). Note that `normBioCond` treats the signal intensities contained in the supplied `bioCond`s as in the scale of `log2` read counts, which is consistent with the default behavior of `normalize`. Note also that `baseline` can be set to `"pseudo-reference"` as in `normalize`. And we recommend using this setting when the number of `bioCond`s to be normalized is large (e.g., >5).
`subset`	An optional vector specifying the subset of intervals to be used for estimating size factors and selecting the baseline. Defaults to the intervals occupied by all the `bioCond` objects. Ignored if `baseline` is specified.
`common.peak.regions`	An optional logical vector specifying the intervals that could possibly be considered as common peak regions for each pair of `bioCond` objects. See also `normalize`.

Details

Technically, normBioCond treats each bioCond object as a ChIP-seq sample. It extracts the sample.mean and occupancy variables stored in each bioCond to represent its signal intensities and occupancy indicators, respectively. See bioCond for a description of the structure of a bioCond object.

Next, MA normalization on these bioCond objects is performed exactly as described in normalize. Specifically, we get a linear transformation for each bioCond object, which is subsequently applied to each of the ChIP-seq samples contained in it.

normBioCond is an effort to reduce potential biases introduced by the MA normalization process. The idea comes from the principle that the more similar two samples are to each other, the fewer biases are expected to introduce when normalizing them. With this function, instead of performing an overall normalization on all the ChIP-seq samples involved, you may choose to first perform a normalization within each biological condition, and then normalize between the resulting bioCond objects (see "Examples" below).

Value

A list of bioCond objects with normalized signal intensities, corresponding to the argument conds. To be noted, information about the mean-variance dependence stored in the original bioCond objects, if any, will be removed from the returned bioConds. You can re-fit a mean-variance curve for them by, for example, calling fitMeanVarCurve. Note also that the original structure matrices are retained for each bioCond in the returned list (see setWeight for a detailed description of structure matrix).

Besides, the following attributes are added to the list describing the MA normalization performed:

size.factor: Size factors of provided bioCond objects. Only present when baseline is not explicitly specified by the user.
baseline: Condition name of the bioCond object used as baseline or "pseudo-reference" if the baseline argument is specified so.
norm.coef: A data frame recording the MA normalization coefficients for each bioCond.
MA.cor: A real matrix recording the Pearson correlation coefficient between M & A values calculated from common peak regions of each pair of bioCond objects. The upper and lower triangle of the matrix are deduced from raw and normalized signal intensities, respectively. Note that M values are always calculated as the column bioCond minus the row one.

References

Tu, S., et al., MAnorm2 for quantitatively comparing groups of ChIP-seq samples. Genome Res, 2021. 31(1): p. 131-145.

Examples

data(H3K27Ac, package = "MAnorm2")
attr(H3K27Ac, "metaInfo")

## Apply MA normalization first within each cell line, and then normalize
## across cell lines.

# Normalize samples separately for each cell line.
norm <- normalize(H3K27Ac, 4, 9)
norm <- normalize(norm, 5:6, 10:11)
norm <- normalize(norm, 7:8, 12:13)

# Construct separately a bioCond object for each cell line, and perform MA
# normalization on the resulting bioConds. Genomic intervals in sex
# chromosomes are not allowed to be common ones, since the cell lines are
# from different genders.
conds <- list(GM12890 = bioCond(norm[4], norm[9], name = "GM12890"),
              GM12891 = bioCond(norm[5:6], norm[10:11], name = "GM12891"),
              GM12892 = bioCond(norm[7:8], norm[12:13], name = "GM12892"))
autosome <- !(H3K27Ac$chrom %in% c("chrX", "chrY"))
conds <- normBioCond(conds, common.peak.regions = autosome)

# Inspect the normalization effects.
attributes(conds)
plot(attr(conds, "MA.cor"), symbreaks = TRUE, margins = c(8, 8))
MAplot(conds[[1]], conds[[2]], main = "GM12890 vs. GM12891")
abline(h = 0, lwd = 2, lty = 5)

[Package MAnorm2 version 1.2.2 Index]

Perform MA Normalization on a Set of bioCond Objects