trans_norm {microeco}R Documentation

Feature abundance normalization/transformation.

Description

Feature abundance normalization/transformation for a microtable object or data.frame object.

Methods

Public methods


Method new()

Get a transposed abundance table if the input is microtable object. In the table, rows are samples, and columns are features. This can make the further operations same with the traditional ecological methods.

Usage
trans_norm$new(dataset = NULL)
Arguments
dataset

the microtable object or data.frame object. If it is data.frame object, please make sure that rows are samples, and columns are features.

Returns

data_table, stored in the object.

Examples
library(microeco)
data(dataset)
t1 <- trans_norm$new(dataset = dataset)

Method norm()

Normalization/transformation methods.

Usage
trans_norm$norm(
  method = "rarefy",
  sample.size = NULL,
  rngseed = 123,
  replace = TRUE,
  pseudocount = 1,
  intersect.no = 10,
  ct.min = 1,
  condition = NULL,
  MARGIN = NULL,
  logbase = 2,
  ...
)
Arguments
method

default "rarefy"; See the following available options.

Methods for normalization:

  • "rarefy": classic rarefaction based on R sample function.

  • "SRS": scaling with ranked subsampling method based on the SRS package provided by Lukas Beule and Petr Karlovsky (2020) <doi:10.7717/peerj.9593>.

  • "clr": Centered log-ratio normalization <ISBN:978-0-412-28060-3> <doi: 10.3389/fmicb.2017.02224>. It is defined:

    clr_{ki} = \log\frac{x_{ki}}{g(x_i)}

    where x_{ki} is the abundance of kth feature in sample i, g(x_i) is the geometric mean of abundances for sample i. A pseudocount need to be added to deal with the zero. For more information, please see the 'clr' method in decostand function of vegan package.

  • "rclr": Robust centered log-ratio normalization <doi: doi:10.1128/msystems.00016-19>. It is defined:

    rclr_{ki} = \log\frac{x_{ki}}{g(x_i > 0)}

    where x_{ki} is the abundance of kth feature in sample i, g(x_i > 0) is the geometric mean of abundances (> 0) for sample i. In rclr, zero values are kept as zeroes, and not taken into account.

  • "GMPR": Geometric mean of pairwise ratios <doi: 10.7717/peerj.4600>. For a given sample i, the size factor s_i is defined:

    s_i = \biggl( {\displaystyle\prod_{j=1}^{n} Median_{k|c_{ki}c_{kj} \ne 0} \lbrace \dfrac{c_{ki}}{c_{kj}} \rbrace} \biggr) ^{1/n}

    where k denotes all the features, and n denotes all the samples. For sample i, GMPR = \frac{x_{i}}{s_i}, where x_i is the feature abundances of sample i.

  • "CSS": Cumulative sum scaling normalization based on the metagenomeSeq package <doi:10.1038/nmeth.2658>. For a given sample j, the scaling factor s_{j}^{l} is defined:

    s_{j}^{l} = {\displaystyle\sum_{i|c_{ij} \leqslant q_{j}^{l}} c_{ij}}

    where q_{j}^{l} is the lth quantile of sample j, that is, in sample j there are l features with counts smaller than q_{j}^{l}. c_{ij} denotes the count (abundance) of feature i in sample j. For l = 0.95m (feature number), q_{j}^{l} corresponds to the 95th percentile of the count distribution for sample j. Normalized counts \tilde{c_{ij}} = (\frac{c_{ij}}{s_{j}^{l}})(N), where N is an appropriately chosen normalization constant.

  • "TSS": Total sum scaling. Abundance is divided by the sequencing depth. For a given sample j, normalized counts is defined:

    \tilde{c_{ij}} = \frac{c_{ij}}{\sum_{i=1}^{N_{j}} c_{ij}}

    where c_{ij} is the counts of feature i in sample j, and N_{j} is the feature number of sample j.

  • "eBay": Empirical Bayes approach to normalization <10.1186/s12859-020-03552-z>. The implemented method is not tree-related. In the output, the sum of each sample is 1.

  • "TMM": Trimmed mean of M-values method based on the normLibSizes function of edgeR package <doi: 10.1186/gb-2010-11-3-r25>.

  • "DESeq2": Median ratio of gene counts relative to geometric mean per gene based on the DESeq function of DESeq2 package <doi: 10.1186/s13059-014-0550-8>. This option can invoke the trans_diff class and extract the normalized data from the original result. Note that either group or formula should be provided. The scaling factor is defined:

    s_{j} = Median_{i} \frac{c_{ij}}{\bigl( {\prod_{j=1}^{n} c_{ij}} \bigr) ^{1/n}}

    where c_{ij} is the counts of feature i in sample j, and n is the total sample number.

  • "Wrench": Group-wise and sample-wise compositional bias factor <doi: 10.1186/s12864-018-5160-5>. Note that condition parameter is necesary to be passed to condition parameter in wrench function of Wrench package. As the input data must be microtable object, so the input condition parameter can be a column name of sample_table. The scaling factor is defined:

    s_{j} = \frac{1}{p} \sum_{ij} W_{ij} \frac{X_{ij}}{\overline{X_{i}}}

    where X_{ij} represents the relative abundance (proportion) for feature i in sample j, \overline{X_{i}} is the average proportion of feature i across the dataset, W_{ij} represents a weight specific to each technique, and p is the feature number in sample.

  • "RLE": Relative log expression.

Methods based on decostand function of vegan package:

  • "total": divide by margin total (default MARGIN = 1, i.e. rows - samples).

  • "max": divide by margin maximum (default MARGIN = 2, i.e. columns - features).

  • "normalize": make margin sum of squares equal to one (default MARGIN = 1).

  • "range": standardize values into range 0...1 (default MARGIN = 2). If all values are constant, they will be transformed to 0.

  • "standardize": scale x to zero mean and unit variance (default MARGIN = 2).

  • "pa": scale x to presence/absence scale (0/1).

  • "log": logarithmic transformation.

Other methods for transformation:

  • "AST": Arc sine square root transformation.

sample.size

default NULL; libray size for rarefaction when method = "rarefy" or "SRS". If not provided, use the minimum number across all samples. For "SRS" method, this parameter is passed to Cmin parameter of SRS function of SRS package.

rngseed

default 123; random seed. Available when method = "rarefy" or "SRS".

replace

default TRUE; see sample for the random sampling; Available when method = "rarefy".

pseudocount

default 1; add pseudocount for those features with 0 abundance when method = "clr".

intersect.no

default 10; the intersecting taxa number between paired sample for method = "GMPR".

ct.min

default 1; the minimum number of counts required to calculate ratios for method = "GMPR".

condition

default NULL; Only available when method = "Wrench". This parameter is passed to the condition parameter of wrench function in Wrench package It must be a column name of sample_table or a vector with same length of samples.

MARGIN

default NULL; 1 = samples, and 2 = features of abundance table; only available when method comes from decostand function of vegan package. If MARGIN is NULL, use the default value in decostand function.

logbase

default 2; The logarithm base.

...

parameters pass to vegan::decostand, or metagenomeSeq::cumNorm when method = "CSS", or edgeR::normLibSizes when method = "TMM" or "RLE", or trans_diff class when method = "DESeq2", or wrench function of Wrench package when method = "Wrench".

Returns

new microtable object or data.frame object.

Examples
newdataset <- t1$norm(method = "clr")
newdataset <- t1$norm(method = "log")

Method clone()

The objects of this class are cloneable with this method.

Usage
trans_norm$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `trans_norm$new`
## ------------------------------------------------

library(microeco)
data(dataset)
t1 <- trans_norm$new(dataset = dataset)

## ------------------------------------------------
## Method `trans_norm$norm`
## ------------------------------------------------

newdataset <- t1$norm(method = "clr")
newdataset <- t1$norm(method = "log")

[Package microeco version 1.8.0 Index]