R: Testing hypotheses about the microbiome using a linear...

ldm {LDM}

R Documentation

Testing hypotheses about the microbiome using a linear decomposition model (LDM)

Description

This function allows you to 1. simultaneously test the global association with the overall microbiome composition and individual OTU associations to give coherent results; 2. test hypotheses based on data at both the frequency (i.e., relative abundance) and arcsine-root-transformed frequency scales, and perform an “omnibus" test that combines results from analyses conducted on the two scales; 3. test presence-absence associations based on infinite number of rarefaction replicates; 4. handle complex design features such as confounders, interactions, and clustered data (with between- and within-cluster covariates); 5. test associations with a survival outcome (i.e., censored survival times); 6. perform mediation analysis of the microbiome; 7. perform the omnibus test LDM-omni3 that combines results from analyses conducted on the frequency, arcsine-root-transformed, and presence-absence scales.

Usage

ldm(
  formula,
  other.surv.resid = NULL,
  data = .GlobalEnv,
  tree = NULL,
  dist.method = "bray",
  dist = NULL,
  cluster.id = NULL,
  strata = NULL,
  how = NULL,
  perm.within.type = "free",
  perm.between.type = "none",
  perm.within.ncol = 0,
  perm.within.nrow = 0,
  n.perm.max = NULL,
  n.rej.stop = 100,
  seed = NULL,
  fdr.nominal = 0.1,
  square.dist = TRUE,
  scale.otu.table = TRUE,
  center.otu.table = TRUE,
  freq.scale.only = FALSE,
  binary = FALSE,
  n.rarefy = 0,
  test.mediation = FALSE,
  test.omni3 = FALSE,
  comp.anal = FALSE,
  comp.anal.adjust = "median",
  n.cores = 4,
  verbose = TRUE
)

Arguments

`formula`	a symbolic description of the model to be fitted. The details of model specification are given under "Details".
`other.surv.resid`	a vector of data, usually the Martingale or deviance residuals from fitting the Cox model to the survival outcome (if it is the outcome of interest) and other covariates.
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the covariates of interest and confounding covariates. If not found in `data`, the covariates are taken from environment(formula), typically the environment from which `ldm` is called. The default is .GlobalEnv.
`tree`	a phylogenetic tree. Only used for calculating a phylogenetic-tree-based distance matrix. Not needed if the calculation of the requested distance does not involve a phylogenetic tree, or if the distance matrix is directly imported through `dist`.
`dist.method`	method for calculating the distance measure, partial match to all methods supported by `vegdist` in the `vegan` package (i.e., "manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao", "mahalanobis") as well as "hellinger" and "wt-unifrac". The Hellinger distance measure (`dist.method="hellinger"`) takes the form `0.5*E`, where E is the Euclidean distance between the square-root-transformed frequency data. The weighted UniFrac distance (`dist.method="wt-unifrac"`) is calculated by interally calling `GUniFrac` in the `GUniFrac` package. Not used when anything other than `dist=NULL` is specified for `dist`. The default is "bray".
`dist`	a distance matrix. Can be an object of class either "dist" or "matrix". The elements of the distance matrix will be squared and then the matrix will be centered if the default choices `square.dist=TRUE` and `center.otu.table=TRUE` are used. If `dist=NULL`, the distance matrix is calculated from the `otu.table`, using the value of `dist.method` (and `tree` if required). The default is NULL.
`cluster.id`	character or factor variable that identifies clusters. The default value cluster.id=NULL if the observations are not clustered (i.e., are independent).
`strata`	a character or factor variable that defines strata (groups), within which to constrain permutations. The default is NULL.
`how`	a permutation control list, for users who want to specify their own call to the `how` function from the `permute` package. The default is NULL.
`perm.within.type`	a character string that takes values "free", "none", "series", or "grid". The default is "free" (for random permutations).
`perm.between.type`	a character string that takes values "free", "none", or "series". The default is "none".
`perm.within.ncol`	a positive integer, only used if perm.within.type="grid". The default is 0. See documentation for permute package for additional details.
`perm.within.nrow`	a positive integer, only used if perm.within.type="grid". The default is 0. See documentation for permute package for additional details.
`n.perm.max`	the maximum number of permutations. The default is NULL, in which case a maximum of 5000 permutations are used for the global test and a maximum of `n.otu` * `n.rej.stop` * (1/`fdr.nominal`) are used for the OTU test, where `n.otu` is the number of OTUs. If a numeric value for `n.perm.max` is specified, this value is used for both global and OTU-level tests.
`n.rej.stop`	the minimum number of rejections (i.e., the permutation statistic exceeds the observed statistic) to obtain before stopping. The default is 100.
`seed`	a user-supplied integer seed for the random number generator in the permutation procedure. The default is NULL; with the default value, an integer seed will be generated internally and randomly. In either case, the integer seed will be stored in the output object in case the user wants to reproduce the permutation replicates.
`fdr.nominal`	the nominal FDR value. The default is 0.1.
`square.dist`	a logical variable indicating whether to square the distance matrix. The default is TRUE.
`scale.otu.table`	a logical variable indicating whether to scale the rows of the OTU table. For count data, this corresponds to dividing by the library size to give frequencies (i.e., relative abundances). Does not affect the tran scale. The default is TRUE.
`center.otu.table`	a logical variable indicating whether to center the columns of the OTU table. The OTU table should be centered if the distance matrix has been centered. Applied to both the frequency and transformed scales. The default is TRUE.
`freq.scale.only`	a logical variable indicating whether to perform analysis of the frequency-scale data only (not the arcsin-root transformed frequency data and the omnibus test). The default is FALSE.
`binary`	a logical value indicating whether to perform presence-absence analysis. The default is FALSE (analyzing relative abundance data).
`n.rarefy`	an integer-valued number of rarefactions. The value "all" is also allowed, and requests the LDM-A method that essentially aggregate information from all rarefactions. The default is 0 (no rarefaction).
`test.mediation`	a logical value indicating whether to perform the mediation analysis. The default is FALSE. If TRUE, the formula takes the specific form `otu.table ~ exposure + outcome` or most generally `otu.table \| (set of confounders) ~ (set of exposures) + (set of outcomes)`.
`test.omni3`	a logical value indicating whether to perform the new omnibus test (LDM-omni3). The default is FALSE.
`comp.anal`	a logical value indicating whether the centered-log-ratio taxa count data are used (LDM-clr). The default is FALSE.
`comp.anal.adjust`	a character string that takes value "median" or "mode" to choose the estimator for the beta mean (Hu and Satten, 2023). The default is "median".
`n.cores`	The number of cores to use in parallel computing, i.e., at most how many child processes will be run simultaneously. The default is 4.
`verbose`	a logical value indicating whether to generate verbose output during the permutation process. Default is TRUE.

Details

The formula has the form

otu.table ~ (first set of covariates) + (second set of covariates) ... + (last set of covariates)

otu.table | confounders ~ (first set of covariates) + (second set of covariates) ... + (last set of covariates)

where otu.table is the OTU table with rows for samples and columns for OTUs and each set of covariates are enclosed in parentheses. The covariates in each submodel (set of covariates) are tested jointly, after projecting off terms in submodels that appear earlier in the model.

For example, given OTU table y and a data frame metadata that contains 4 covariates, a, b, c and d, some valid formulas would be:

y ~ a + b + c + d ### no confounders, 4 submodels (i.e., sets of covariates)

y ~ (a+b) + (c+d) ### no confounders, 2 submodels each having 2 covariates

y | b ~ (a+c) + d ### b is a confounder, submodel 1 is (a+c), and submodel 2 is d

y | b+c ~ a*d ### there are 2 confounders b and c; there is 1 submodel consisting of the three terms a, d, and a:d (interaction). This example is equivalent to y | b+c ~ (a+d+a:d)

y | as.factor(b) ~ (a+d) + a:d ### the confounder b will be treated as a factor variable, submodel 1 will have the main effects a and d, and submodel 2 will have only the interaction between a and d

y | as.factor(b) ~ (a) + (d) + (a:d) ### there are 3 submodels a, d, and a:d. Putting paratheses around a single variable is allowed but not necessary.

Submodels that combine character and numeric values are allowed; character-valued variables are coerced into factor variables. Confounders are distinguished from other covariates as test statistics are not calculated for confounders (which are included for scientific reasons, not by virtue of significance test results); consequently they also do not contribute to stopping criteria. If tests of confounders are desired, confounders should put on the right hand side of the formula as the first submodel.

For testing mediation effects of the microbiome that mediate the effect of the exposure(s) on the outcome(s), the formula takes the specific form:

otu.table ~ exposure + outcome

or most generally

otu.table | (set of confounders) ~ (set of exposures) + (set of outcomes)

in which there should be exactly two terms on the right hand side of the regression, corresponding to the exposure(s) and the outcome(s), the outcome(s) must appear after the exposure(s), and the covariates or confounders must appear after |.

LDM uses two sequential stopping criteria. For the global test, LDM uses the stopping rule of Besag and Clifford (1991), which stops permutation when a pre-specified minimum number (default=100) of rejections (i.e., the permutation statistic exceeded the observed test statistic) has been reached. For the OTU-specific tests, LDM uses the stopping rule of Sandve et al. (2011), which stops permutation when every OTU test has either reached the pre-specified number (default=100) of rejections or yielded a q-value that is below the nominal FDR level (default=0.1). As a convention, we call a test "stopped" if the corresponding stopping criterion has been satisfied. Although all tests are always terminated if a pre-specified maximum number (see description of n.perm.max in Arguments list) of permutations have been generated, some tests may not have "stopped". This typically occurs when the relevant p-value is small or near the cutoff for inclusion in a list of significant findings; for global tests meeting the stopping criterion is not critical, but caution is advised when interpreting OTU-level tests that have not stopped as additional OTUs may be found with a larger number of permutations.

Value

a list consisting of

`x`	the (orthonormal) design matrix X as defined in Hu and Satten (2020)
`dist`	the (squared/centered) distance matrix
`mean.freq`	the mean relative abundance of OTUs (the column means of the frequency-scale OTU table)
`y.freq`	the frequency-scale OTU table, scaled and centered if so specified
`d.freq`	a vector of the non-negative diagonal elements of `D` that satisfies `x^T y.freq = D v^T`
`v.freq`	the v matrix with unit columns that satisfies `x^T y.freq = D v^T`
`y.tran`	the (column-centered) arcsin-root-transformed OTU table
`d.tran`	a vector of the non-negative diagonal elements of `D` that satisfies `x^T y.tran = D v^T`
`v.tran`	the v matrix with unit columns that satisfies `x^T y.tran = D v^T`
`low`	a vector of lower indices for confounders (if there is any) and submodels
`up`	a vector of upper indices for confounders (if there is any) and submodels
`beta`	a matrix of effect sizes of every trait on every OTU
`phi`	a matrix of probabilities that the rarefied count of an OTU in a sample is non-zero
`VE.global.freq.confounders`	Variance explained (VE) by confounders, based on the frequency-scale data
`VE.global.freq.submodels`	VE by each submodel, based on the frequency-scale data
`VE.global.freq.residuals`	VE by each component in the residual distance, based on the frequency-scale data
`VE.otu.freq.confounders`	Contribution of each OTU to VE by confounders, based on the frequency-scale data
`VE.otu.freq.submodel`	Contribution of each OTU to VE by each submodel, based on the frequency-scale data
`VE.global.tran.confounders`	VE by confounders, based on the arcsin-root-transformed frequency data
`VE.global.tran.submodels`	VE by each submodel, based on the arcsin-root-transformed frequency data
`VE.global.tran.residuals`	VE by each component in the residual distance, based on the arcsin-root-transformed frequency data
`VE.otu.tran.confounders`	Contribution of each OTU to VE by confounders, based on the arcsin-root-transformed frequency data
`VE.otu.tran.submodels`	Contribution of each OTU to VE by each submodel, based on the arcsin-root-transformed frequency data
`VE.df.confounders`	Degree of freedom (i.e., number of components) associated with the VE for confounders
`VE.df.submodels`	Degree of freedom (i.e., number of components) associated with the VE for each submodel
`F.global.freq`	F statistics for testing each submodel, based on the frequency-scale data
`F.global.tran`	F statistics for testing each submodel, based on the arcsin-root-transformed frequency data
`F.otu.freq`	F statistics for testing each OTU for each submodel, based on the frequency-scale data
`F.otu.tran`	F statistics for testing each OTU for each submodel, based on the arcsin-root-transformed data
`p.global.freq`	p-values for the global test of each set of covariates based on the frequency-scale data
`p.global.tran`	p-values for the global test of each set of covariates based on the arcsin-root-transformed frequency data
`p.global.pa`	p-values for the global test of each set of covariates based on the presence-absence data
`p.global.omni`	p-values for the global test of each set of covariates based on the omnibus statistics in LDM-omni, which are the minima of the p-values obtained from the frequency scale and the arcsin-root-transformed frequency data as the final test statistics, and use the corresponding minima from the permuted data to simulate the null distributions
`p.global.harmonic`	p-values for the global test of each set of covariates based on the Harmonic-mean p-value combination method applied to the OTU-level omnibus p-values
`p.global.fisher`	p-values for the global test of each set of covariates based on the Fisher p-value combination method applied to the OTU-level omnibus p-values
`p.global.omni3`	p-values for the global test of each set of covariates based on the omnibus test LDM-omni3
`p.global.freq.OR`, `p.global.tran.OR`, `p.global.pa.OR`, `p.global.omni.OR`, `p.global.harmonic.OR`, `p.global.fisher.OR`, `p.global.omni3.OR`	global p-values for testing `other.surv.resid`
`p.global.freq.com`, `p.global.tran.com`, `p.global.pa.com`, `p.global.omni.com`, `p.global.harmonic.com`, `p.global.fisher.com`, `p.global.omni3.com`	global p-values from the combination test that combines the results from analyzing both the Martingale and deviance residuals from a Cox model (one of them is supplied by `other.surv.resid`)
`p.otu.freq`	p-values for the OTU-specific tests based on the frequency scale data
`p.otu.tran`	p-values for the OTU-specific tests based on the arcsin-root-transformed frequency data
`p.otu.pa`	p-values for the OTU-specific tests based on the presence-absence data
`p.otu.omni`	p-values for the OTU-specific tests based on the omnibus test LDM-omni
`p.otu.omni3`	p-values for the OTU-specific tests based on the omnibus test LDM-omni3
`q.otu.freq`	q-values (i.e., FDR-adjusted p-values) for the OTU-specific tests based on the frequency scale data
`q.otu.tran`	q-values for the OTU-specific tests based on the arcsin-root-transformed frequency data
`q.otu.pa`	q-values (i.e., FDR-adjusted p-values) for the OTU-specific tests based on the presence-absence data
`q.otu.omni`	q-values for the OTU-specific tests based on the omnibus test LDM-omni
`q.otu.omni3`	q-values for the OTU-specific tests based on the omnibus test LDM-omni3
`p.otu.freq.OR`, `p.otu.tran.OR`, `p.otu.pa.OR`, `p.otu.omni.OR`, `p.otu.omni3.OR`, `q.otu.freq.OR`, `q.otu.tran.OR`, `q.otu.pa.OR`, `q.otu.omni.OR`, `q.otu.omni3.OR`	OTU-level p-values and q-values for testing `other.surv.resid`
`p.otu.freq.com`, `p.otu.tran.com`, `p.otu.pa.com`, `p.otu.omni.com`, `p.otu.omni3.com`, `q.otu.freq.com`, `q.otu.tran.com`, `q.otu.pa.com`, `q.otu.omni.com`, `q.otu.omni3.com`	OTU-level p-values and q-values from the combination tests that combine the results from analyzing both the Martingale and deviance residuals from a Cox model (one of them is supplied by `other.surv.resid`)
`detected.otu.freq`	detected OTUs (whose names are found in the column names of the OTU table) at the nominal FDR, based on the frequency scale data
`detected.otu.tran`	detected OTUs based on the arcsin-root-transformed frequency data
`detected.otu.pa`	detected OTUs based on the presence-absence data
`detected.otu.omni`	detected OTU based on the omnibus test LDM-omni
`detected.otu.omni3`	detected OTU based on the omnibus test LDM-omni3
`detected.otu.freq.OR`, `detected.otu.tran.OR`, `detected.otu.pa.OR`, `detected.otu.omni.OR`, `detected.otu.omni3.OR`	detected OTUs for `other.surv.resid`
`detected.otu.freq.com`, `detected.otu.tran.com`, `detected.otu.pa.com`, `detected.otu.omni.com`, `detected.otu.omni3.com`	detected OTUs by the combination tests that combines the Martingale and deviance residuals from a Cox model (one of them is supplied by `other.surv.resid`)
`med.p.global.freq`, `med.p.global.tran`, `med.p.global.omni`, `med.p.global.pa`, `med.p.global.harmonic`, `med.p.global.fisher`, `med.p.global.omni3`	p-values for the global tests of the overall mediation effect by the microbiome
`med.p.global.freq.OR`, `med.p.global.tran.OR`, `med.p.global.omni.OR`, `med.p.global.pa.OR`, `med.p.global.harmonic.OR`, `med.p.global.fisher.OR`, `med.p.global.omni3.OR`	p-values for the global tests of the overall mediation effect by the microbiome, when the outcome is `other.surv.resid`
`med.p.global.freq.com`, `med.p.global.tran.com`, `med.p.global.omni.com`, `med.p.global.pa.com`, `med.p.global.harmonic.com`, `med.p.global.fisher.com`, `med.p.global.omni3.com`	p-values for the global tests of the overall mediation effect by the microbiome, combining the results from analyzing both the Martingale and deviance residuals as outcomes
`med.detected.otu.freq`, `med.detected.otu.tran`, `med.detected.otu.omni`, `med.detected.otu.pa`, `med.detected.otu.omni3`	detected mediating OTUs
`med.detected.otu.freq.OR`, `med.detected.otu.tran.OR`, `med.detected.otu.omni.OR`, `med.detected.otu.pa.OR`, `med.detected.otu.omni3.OR`	detected mediating OTUs for the outcome `other.surv.resid`
`med.detected.otu.freq.com`, `med.detected.otu.tran.com`, `med.detected.otu.omni.com`, `med.detected.otu.pa.com`, `med.detected.otu.omni3.com`	detected mediating OTUs, combining the results from analyzing both the Martingale and deviance residuals as outcomes
`n.perm.completed`	number of permutations completed
`global.tests.stopped`	a logical value indicating whether the stopping criterion has been met by all global tests
`otu.tests.stopped`	a logical value indicating whether the stopping criterion has been met by all OTU-specific tests
`seed`	the seed that is user supplied or internally generated, stored in case the user wants to reproduce the permutation replicates

Author(s)

Yi-Juan Hu <yijuan.hu@emory.edu>, Glen A. Satten <gsatten@emory.edu>

References

Hu YJ, Satten GA (2020). Testing hypotheses about the microbiome using the linear decomposition model (LDM) Bioinformatics, 36(14), 4106-4115.

Hu YJ, Lane A, and Satten GA (2021). A rarefaction-based extension of the LDM for testing presence-absence associations in the microbiome. Bioinformatics, 37(12):1652-1657.

Zhu Z, Satten GA, Caroline M, and Hu YJ (2020). Analyzing matched sets of microbiome data using the LDM and PERMANOVA. Microbiome, 9(133), https://doi.org/10.1186/s40168-021-01034-9.

Zhu Z, Satten GA, and Hu YJ (2022). Integrative analysis of relative abundance data and presence-absence data of the microbiome using the LDM. Bioinformatics, doi.org/10.1093/bioinformatics/btac181.

Yue Y and Hu YJ (2021) A new approach to testing mediation of the microbiome using the LDM. bioRxiv, https://doi.org/10.1101/2021.11.12.468449.

Hu Y, Li Y, Satten GA, and Hu YJ (2022) Testing microbiome associations with censored survival outcomes at both the community and individual taxon levels. bioRxiv, doi.org/10.1101/2022.03.11.483858.

Examples

res.ldm <- ldm(formula=throat.otu.tab5 | (Sex+AntibioticUse) ~ SmokingStatus+PackYears, 
              data=throat.meta, seed=67817, fdr.nominal=0.1, n.perm.max=1000, n.cores=1, 
              verbose=FALSE)

[Package LDM version 6.0.1 Index]