R: Phylogenetic-bin-based null model analysis under different...

icamp.cm2 {iCAMP}

R Documentation

Phylogenetic-bin-based null model analysis under different metacommunity settings for phylogenetic and taxonomic null models

Description

Perform phylogenetic-bin-based null model analysis and quantify the relative importance of different processes. This function can deal with local communities under different metacommunities (regional pools), and different metacommunity settings for phylogenetic and taxonomic models

Usage

icamp.cm2(comm, tree, meta.group.phy = NULL, meta.com.phy = NULL,
          meta.frequency.phy = NULL, meta.ab.phy = NULL,
          meta.group.tax = NULL, meta.com.tax = NULL,
          meta.frequency.tax = NULL, meta.ab.tax = NULL,
          pd.desc = NULL, pd.spname = NULL, pd.wd = getwd(),
          rand = 1000, prefix = "iCAMP", ds = 0.2, pd.cut = NA,
          phylo.rand.scale = c("within.bin", "across.all", "both"),
          taxa.rand.scale = c("across.all", "within.bin", "both"),
          phylo.metric = c("bMPD", "bMNTD", "both", "bNRI", "bNTI"),
          sig.index = c("Confidence", "SES.RC", "SES", "RC"),
          bin.size.limit = 24, nworker = 4, memory.G = 50,
          rtree.save = FALSE, detail.save = TRUE, qp.save = TRUE,
          detail.null = FALSE, ignore.zero = TRUE, output.wd = getwd(),
          correct.special = TRUE, unit.sum = rowSums(comm),
          special.method = c("depend", "MPD", "MNTD", "both"),
          ses.cut = 1.96, rc.cut = 0.95, conf.cut = 0.975,
          omit.option = c("no", "test", "omit"),
          treepath.file = "path.rda", pd.spname.file = "pd.taxon.name.csv",
          pd.backingfile = "pd.bin", pd.desc.file = "pd.desc",
          taxo.metric = "bray", transform.method = NULL, logbase = 2,
          dirichlet = FALSE, d.cut.method = c("maxpd", "maxdroot"))

Arguments

`comm`	matrix or data.frame, community data, each row is a sample or site, each colname is a taxon (a species or OTU or ASV), thus rownames should be sample IDs, colnames should be taxa IDs.
`tree`	phylogenetic tree, an object of class "phylo".
`meta.group.phy`	matrix or data.frame, a one-column (n x 1) matrix indicating which metacommunity each sample belongs to in the null model for phylogenetic beta diversity. Rownames are sample IDs. The first column is metacommunity names. Such that different samples can belong to different metacommunities. If input a n x m matrix, only the first column is used. NULL means all samples belong to the same metacommunity. Default is NULL, means all samples from the same metacommunity.
`meta.com.phy`	a list object, each element is a matrix or data.frame to define abundance (or relative abundance) of taxa in a metacommunity (regional pool) in the null model for phylogenetic beta diversity. The element names indicate metacommunity names, which should be consistent with the metacommunity names defined in meta.group. If there is only one metacommunity, meta.com can be a matrix or data.frame to define taxa abundance (or relative abundance) in the metacommunity. Default is NULL, means to calculate metacommunity structure from comm according to metacommunities defined in meta.group.
`meta.frequency.phy`	matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the occurrence frequency of each taxon in each metacommunity in the null model for phylogenetic beta diversity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.frequency as occurrence frequency of each taxon in comm across the samples within each metacommunity defined by meta.group.
`meta.ab.phy`	matrix or data.frame, each column represents a taxon, each row represents a metacommunity (regional pool), to define the aubndance (or relative abundance) of each taxon in each metacommunity in the null model for phylogenetic beta diversity. The rownames indicate metacommunity names, which should be the same as the metacommunity names in meta.group. Default setting is NULL, means to calculate meta.ab as average relative abundance of each taxon in comm across the samples within each metacommunity defined by meta.group.
`meta.group.tax`	the same format as meta.group.phy, but for taxonomic null model.
`meta.com.tax`	the same format as meta.com.phy, but for taxonomic null model.
`meta.frequency.tax`	the same format as meta.frequency.phy, but for taxonomic null model.
`meta.ab.tax`	the same format as meta.ab.phy, but for taxonomic null model.
`pd.desc`	the name of the file to hold the backingfile description of the phylogenetic distance matrix, it is usually "pd.desc" if using default setting in pdist.big function. If it is NULL, the fucntion pd.big will be used to calculate the phylogenetic distance matrix from tree, and save it in pd.wd as a big.memory file.
`pd.spname`	character vector, taxa id in the same rank as the big matrix of phylogenetic distances.
`pd.wd`	folder path, where the bigmemmory file of the phylogenetic distance matrix are saved.
`rand`	integer, randomization times. default is 1000.
`prefix`	character string, the prefix of those output files.
`ds`	numeric, the general threshold of phylogenetic distance within which the phylogenetic signal is still significant. default is 0.2.
`pd.cut`	numeric, the distance to the tree root where the phylogenetic tree is trancated to get strict phylogenetic bins. if pd.cut is set, the distance threshold (ds) is disabled. default is NA.
`phylo.rand.scale`	character, the scale to randomize the taxa for phylogenetic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is within.bin.
`taxa.rand.scale`	character, the scale to randomize the taxa for taxonomic null model. "within.bin" means randomization within each bin; "across.all" means randomization across all bins; "both" means to test both methods. Default setting is across.all.
`phylo.metric`	character, the metric for phylogenetic null model analysis. bMPD (or bNRI), null model analysis based on beta mean pairwise distance (betaMPD); if sig.index is SES, it is beta net relatedness index (betaNRI). bMNTD (or bNTI), null model analysis based on beta mean nearest taxon distance (betaMNTD); if sig.index is SES, it is beta nearest taxon index (betaNTI). both, use null model test based on both bMPD and bMNTD. Default setting is based on bMPD.
`sig.index`	character, the index for null model significance test. Confidence, percentage of null values less extreme than the observed value, i.e. non-parametric one-side confidence level; if set sig.index as Confidence, it will be applied to both phylogenetic and taxonomic metrics. If set as SES.RC, use standard effect size (SES) for phylogenetic metrics (i.e. betaNTI or betaNRI), and use modified Raup-Crick (RC) for taxonomic metrics (RCbray). If set as SES, use SES for both phylogenetic and taxonomic metrics. If set as RC, use RC for both phylogenetic and taxonomic metrics. default is Confidence. If input a vector, only the first one will be used.
`bin.size.limit`	integer, the minimal requirement of bin size (taxa numer in a bin). Default setting is 24.
`nworker`	integer, for parallel computing. Either a character vector of host names on which to run the worker copies of R, or a positive integer (in which case that number of copies is run on localhost). default is 4, means 4 threads will be run.
`memory.G`	numeric, to set the memory size as you need, so that calculation of large tree will not be limited by physical memory. unit is Gb. default is 50Gb.
`rtree.save`	logic, whether to save the rooted tree as nwk file, if the input tree is not rooted. Default is FALSE.
`detail.save`	logic, whether to save the details, i.e. some key objects for iCAMP analysis, as rda file. Default is TRUE.
`qp.save`	logic, whether to save the relative importance of processes as csv file. Default is TRUE.
`detail.null`	logic, if TRUE, the output will include all the null values. Default is FALSE. But this need to be TRUE if you want to change significance testing index later using 'change.sigindex'.
`ignore.zero`	logic, in the community data matrix (comm), whether to remove the row(s)/column(s) of which the sum is zero. Default is TRUE.
`output.wd`	a folder path, where the files will be saved when rtree.save, detail.save, or qp.save is true.
`correct.special`	logic, whether to correct the special cases when calculating bNRI or bNTI. Default is TRUE.
`unit.sum`	NULL or a number or a nemeric vector. When a beta diversity index is calculated for a bin, the taxa abundances will be divided by unit.sum to calculate the relative abundances, and the Bray-Cuits index in each bin will become manhattan index divided by 2. Default setting are the row sums of community matrix, which are usually sequencing depth in each sample. If set as NULL, means not to do this special transformation.
`special.method`	When correct.special is TRUE, which method will be used to check underestimation of deterministic pattern(s) in special cases. MPD, use null model test based on mean pairwise distance; MNTD, use null model test based on mean nearest taxon distance; depend, use MPD when phylo.metric is bMPD or bNRI, and use MNTD when phylo.metric is bMNTD or bNTI; both, use both MPD and MNTD. Default is depend
`ses.cut`	numeric, the cutoff of significant standard effect size, default is 1.96.
`rc.cut`	numeric, the cutoff of significant modified Raup-Crick index value, default is 0.95.
`conf.cut`	numeric, the cutoff of significant one-side confidence level, default is 0.975.
`omit.option`	three options about omitting small bins. "no" means to merge small bins to their nearest relatives to meet the bin size requirement, rather than omitting them; "test" means to output the information of small strict bins with a size lower than requirement, iCAMP will not be performed; "omit" means to do iCAMP analysis with strict bins which have enough species (larger than bin size requirement).
`treepath.file`	character, name of the file saving the tree.path, which is a list of all the nodes and edge lengthes from root to every tip and/or node. it should be a .rda filename.
`pd.spname.file`	character, name of the file saving the taxa IDs, which has exactly the same order as the row names (and column names) of the big phylogenetic distance matrix. it should be a .csv filename.
`pd.backingfile`	character, the root name for the file for the cache of the big phylogenetic distance matrix. it should be a .bin filename.
`pd.desc.file`	character, name of the file to hold the backingfile description for the big phylogenetic distance matrix. it should be a .desc filename.
`taxo.metric`	taxonomic beta diversity index, the same as 'method' in the function 'vegdist' in package 'vegan', including "manhattan", "euclidean", "canberra", "clark", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao", "cao" or "mahalanobis". If taxo.metric='bray' and transform.method=NULL, RC will be calculated based on Bray-Curtis dissimilarity as recommended in original iCAMP; otherwise, unit.sum setting will be ignored.
`transform.method`	character or a defined function, to specify how to transform community matrix before calculating dissimilarity. if it is a characher, it should be a method name as in the function 'decostand' in package 'vegan', including 'total','max','freq','normalize','range','standardize','pa','chi.square','cmdscale','hellinger','log'.
`logbase`	numeric, the logarithm base used when transform.method='log'.
`dirichlet`	Logic. If TRUE, the taxonomic null model will use Dirichlet distribution to generate relative abundances in randomized community matrix. If the input community matrix has all row sums no more than 1, the function will automatically set dirichlet=TRUE. default is FALSE.
`d.cut.method`	character, to specify the method to calculate pd.cut from ds. 'maxpd' means based on maximum phylogenetic distance, pd.cut = (maxpd - ds)/2. 'maxdroot' means based on maximum distance to root, pd.cut = maxdroot - (ds/2), which is preferred if the tree only has one edge from the root.

Details

This function is particularly designed for samples from different metacommunities, and allows phylogenetic and taxonomic null models have different settings of metacommunities. The null model will randomize the commuity matrix under different metacommunities, separately (and independently). All other details are the same as the function icamp.big.

Value

If omit.option is test, the output will be a table summarizing the information of small bins.

Otherwise, the output is a list object, including one or more elements as below:

The first one or selveral (if set 'both' for metrics and/or randomization scale) elements are matrixes of process importances at community level. In each matrix, the first two columns will be sample ID of each turnover, and the third to last column will show estimated relative importance of each process in shaping each turnover between communities (samples). The name(s) of the element(s) shows the metrics and its randomization scale, e.g. bNRIiRCa means phylogenetic null model analysis using betaNRI (i.e. SES based on betaMPD) with randomizaiton within each bin and taxonomic null model analysis using RC based on Bray-Curtis with randomization across bins. Other possible phylogenetic null-model-based metrics: bNTI, betaNTI (i.e. SES based on betaMNTD); RCbMPD, RC based on betaMPD; RCbMNTD, RC based on betaMNTD; CbMPD, confidence level based on betaMPD; CbMNTD, confidence level based on betaMNTD. Other possible taxonomic null-model-based metrics: SESbray, SES based on Bray-Curtis; CBray, confidence level based on Bray-Curtis. i, within-bin randomization; a, across-bin randomization.

`detail`	an element in output only if detail.save is TRUE. A list with elements as below.
`taxabin`	an element in 'detail'. A list, show phylogenetic binning results. The first element is a matrix named sp.bin, where each row is a taxon (OTU or ASV), the first column is the original strict bin ID, the second column is the original bin ID after small bins are merged into nearest relative(s), the third column is the final renewed bin ID. The second element named bin.united.sp is a list, where each element shows taxa IDs within each bin and the bins are in the order of the final renewed bin IDs. The third element named bin.strict.sp is a list, where each element shows taxa IDs within each strict bin and the bins are in the order of the original strict bin IDs. The fourth element named state.strict is a matrix, where the 1st column is orginal strict bin IDs, the 2nd column is the taxa number in each strict bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each strict bin. The fifth element named state.united is a matrix, where the row numbering is the final bin ID, the 1st column is orginal bin IDs, the 2nd column is the taxa number in each final bin, the 3rd to 5th columns show the maximum, mean, and standard deviation of phylogenetic distances within each final bin.
`SigbMPDi`, `SigbMPDa`, `SigbMNTDi`, `SigbMNTDa`, `SigBCi`, `SigBCa`	elements in 'detail', matrixes showing null model significance testing index for each turnover of each bin. In the name of the element(s), SigbMPD, SigMNTD, or SigBC mean the significance testing is based on betaMPD, betaMNTD, or taxonomic dissimilarity (default is Bray-Curtis); i, within-bin randomization; a, across-bin randomization. In each matrix, the first two columns are sample IDs for each turnover; the 3rd to the last column represent different bins with column names containing the significance testing index name, which can be bNRI, bNTI, RCbMPD, RCbMNTD, CbMPD, CbMNTD, SESbray, RCbray, or CBray as mentioned above.
`bin.weight`	an element in 'detail', a matrix showing relative abundance of each bin in each pair of samples.
`processes`	an element in 'detail', a list of process importance results at community level.
`setting`	an element in 'detail', a data.frame showing all basic settings of this function.
`comm`	an element in 'detail', the input community matrix.
`rand`	an element in output only if detail.null is TRUE. It is a list with each element showing the observed or null values of a beta diversity index (e.g. betaMPD, betaMNTD, Bray-Curtis). Each index is showed as a list where each element represents a bin.
`special.crct`	an element in output only if detail.null is TRUE. It shows the corrected values for special cases, where zero means no correction is needed.

Note

Version 1: 2022.2.10

Author(s)

Daliang Ning

References

Ning, D., Yuan, M., Wu, L., Zhang, Y., Guo, X., Zhou, X. et al. (2020). A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nature Communications, 11, 4717.

Stegen, J.C., Lin, X., Fredrickson, J.K. & Konopka, A.E. (2015). Estimating and mapping ecological processes influencing microbial community assembly. Front Microbiol, 6, 370.

Stegen, J.C., Lin, X., Fredrickson, J.K., Chen, X., Kennedy, D.W., Murray, C.J. et al. (2013). Quantifying community assembly processes and identifying features that impose them. ISME J, 7, 2069.

Zhou, J. & Ning, D. (2017). Stochastic community assembly: Does it matter in microbial ecology? Microbiology and Molecular Biology Reviews, 81.

Veech, J.A. (2012). Significance testing in ecological null models. Theor Ecol, 5, 611-616.

Kane, M.J., Emerson, J., Weston, S. (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.

Examples

data("example.data")
comm=example.data$comm
tree=example.data$tree

# in this example, 10 samples from one metacommunity,
# the other 10 samples from another metacommunity.
meta.group=data.frame(meta.com=c(rep("meta1",10),rep("meta2",10)))
rownames(meta.group)=rownames(comm)

# since need to save some output to a certain folder,
# the following code is set as 'not test'.
# but you may test the code on your computer
# after change the folder path for 'save.wd'.

  wd0=getwd() # please change to the folder you want to save the pd.big output.
  save.wd=paste0(tempdir(),"/pdbig.icampcm2")
  nworker=2 # parallel computing thread number
  rand.time=20 # usually use 1000 for real data.
  
  bin.size.limit=5 # for real data, usually use a proper number
  # according to phylogenetic signal test or try some settings
  # then choose the reasonable stochasticity level.
  # our experience is 12, or 24, or 48.
  # but for this example dataset which is too small, have to use 5.
  
  icamp.out=icamp.cm2(comm=comm, tree=tree, meta.group.phy=meta.group,
                     meta.group.tax=NULL, pd.wd=save.wd, rand=rand.time,
                     nworker=nworker, bin.size.limit=bin.size.limit)
  setwd(wd0)

[Package iCAMP version 1.5.12 Index]