| trans_diff {microeco} | R Documentation |
Create trans_diff object for the differential analysis on the taxonomic abundance
Description
This class is a wrapper for a series of differential abundance test and indicator analysis methods, including
LEfSe based on the Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>,
random forest <doi:10.1016/j.geoderma.2018.09.035>, metastat based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>,
non-parametric Kruskal-Wallis Rank Sum Test,
Dunn's Kruskal-Wallis Multiple Comparisons based on the FSA package, Wilcoxon Rank Sum and Signed Rank Tests, t-test, anova,
Scheirer Ray Hare test,
R package metagenomeSeq Paulson et al. (2013) <doi:10.1038/nmeth.2658>,
R package ANCOMBC <doi:10.1038/s41467-020-17041-7>, R package ALDEx2 <doi:10.1371/journal.pone.0067019; 10.1186/2049-2618-2-15>,
R package MicrobiomeStat <doi:10.1186/s13059-022-02655-5>, beta regression <doi:10.18637/jss.v034.i02>, R package maaslin2,
linear mixed-effects model and generalized linear mixed model.
Methods
Public methods
Method new()
Usage
trans_diff$new(
dataset = NULL,
method = c("lefse", "rf", "metastat", "metagenomeSeq", "KW", "KW_dunn", "wilcox",
"t.test", "anova", "scheirerRayHare", "lm", "ancombc2", "ALDEx2_t", "ALDEx2_kw",
"DESeq2", "edgeR", "linda", "maaslin2", "betareg", "lme", "glmm", "glmm_beta")[1],
group = NULL,
taxa_level = "all",
filter_thres = 0,
alpha = 0.05,
p_adjust_method = "fdr",
transformation = NULL,
remove_unknown = TRUE,
lefse_subgroup = NULL,
lefse_min_subsam = 10,
lefse_norm = 1e+06,
nresam = 0.6667,
boots = 30,
rf_ntree = 1000,
group_choose_paired = NULL,
metagenomeSeq_count = 1,
ALDEx2_sig = c("wi.eBH", "kw.eBH"),
by_group = NULL,
by_ID = NULL,
beta_pseudo = .Machine$double.eps,
...
)Arguments
datasetdefault NULL;
microtableobject.methoddefault "lefse"; see the following available options:
- 'lefse'
LEfSe method based on Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>
- 'rf'
random forest and non-parametric test method based on An et al. (2019) <doi:10.1016/j.geoderma.2018.09.035>
- 'metastat'
Metastat method for all paired groups based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>
- 'metagenomeSeq'
zero-inflated log-normal model-based differential test method from
metagenomeSeqpackage.- 'KW'
KW: Kruskal-Wallis Rank Sum Test for all groups (>= 2)
- 'KW_dunn'
Dunn's Kruskal-Wallis Multiple Comparisons when group number > 2; see dunnTest function in
FSApackage- 'wilcox'
Wilcoxon Rank Sum and Signed Rank Tests for all paired groups
- 't.test'
Student's t-Test for all paired groups
- 'anova'
ANOVA for one-way or multi-factor analysis; see
cal_difffunction oftrans_alphaclass- 'scheirerRayHare'
Scheirer Ray Hare test for nonparametric test used for a two-way factorial experiment; see
scheirerRayHarefunction ofrcompanionpackage- 'lm'
Linear Model based on the
lmfunction- 'ALDEx2_t'
runs Welch's t and Wilcoxon tests with
ALDEx2package; see also the test parameter inALDEx2::aldexfunction; ALDEx2 uses the centred log-ratio (clr) transformation and estimates per-feature technical variation within each sample using Monte-Carlo instances drawn from the Dirichlet distribution; Reference: <doi:10.1371/journal.pone.0067019> and <doi:10.1186/2049-2618-2-15>; requireALDEx2package to be installed (https://bioconductor.org/packages/release/bioc/html/ALDEx2.html)- 'ALDEx2_kw'
runs Kruskal-Wallace and generalized linear model (glm) test with
ALDEx2package; see also thetestparameter inALDEx2::aldexfunction.- 'DESeq2'
Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution based on the
DESeq2package.- 'edgeR'
The
exactTestmethod ofedgeRpackage is implemented.- 'ancombc2'
Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC) based on the
ancombc2function fromANCOMBCpackage. If thefix_formulaparameter is not provided, the function can automatically assign it by using group parameter. For this method, thegroupparameter is directly passed to the group parameter ofancombc2function. Reference: <doi:10.1038/s41467-020-17041-7><10.1038/s41592-023-02092-7>; RequireANCOMBCpackage to be installed (https://bioconductor.org/packages/release/bioc/html/ANCOMBC.html)- 'linda'
Linear Model for Differential Abundance Analysis of High-dimensional Compositional Data based on the
lindafunction ofMicrobiomeStatpackage. For linda method, please provide either the group parameter or the formula parameter. When the formula parameter is provided, it should start with '~' as it is directly used by the linda function. If the group parameter is used, the prefix '~' is not necessary as the function can automatically add it. The parameterfeature.dat.type = 'count'has been fixed. Other parameters can be passed to thelindafunction. Reference: <doi:10.1186/s13059-022-02655-5>- 'maaslin2'
finding associations between metadata and potentially high-dimensional microbial multi-omics data based on the Maaslin2 package. Using this option can invoke the
trans_env$cal_corfunction withcor_method = "maaslin2".- 'betareg'
Beta Regression based on the
betaregpackage. Please see thebeta_pseudoparameter for the use of pseudo value when there is 0 or 1 in the data- 'lme'
Linear Mixed Effect Model based on the
lmerTestpackage. In the return table, the significance of fixed factors are tested by functionanova. The significance of 'Estimate' in each term of fixed factors comes from the model.- 'glmm'
Generalized linear mixed model (GLMM) based on the
glmmTMBpackage. Theformulaandfamilyparameters are needed. Please refer to glmmTMB package to select the family function, e.g.family = glmmTMB::lognormal(link = "log"). The usage of formula is similar with that in 'lme' method. For more available parameters, please seeglmmTMB::glmmTMBfunction and use parameter passing. In the return table, Conditional_R2 and Marginal_R2 represent total variance (explained by both fixed and random effects) and the variance explained by fixed effects, respectively. The significance of fixed factors are tested by Chi-square test from functioncar::Anova. The significance of 'Estimate' in each term of fixed factors comes from the model.- 'glmm_beta'
Generalized linear mixed model with a family function of beta distribution, developed for the relative abundance (ranging from 0 to 1) of taxa specifically. This is an extension of the GLMM model in
'glmm'option. The only difference is inglmm_betathe family function is fixed with the beta distribution function, i.e.family = glmmTMB::beta_family(link = "logit"). Please see thebeta_pseudoparameter for the use of pseudo value when there is 0 or 1 in the data
groupdefault NULL; sample group used for the comparision; a colname of input
microtable$sample_table; It is necessary when method is not "anova" or method is "anova" but formula is not provided. Once group is provided, the return res_abund will have mean and sd values for group.taxa_leveldefault "all"; 'all' represents using abundance data at all taxonomic ranks; For testing at a specific rank, provide taxonomic rank name, such as "Genus". If the provided taxonomic name is neither 'all' nor a colname in tax_table of input dataset, the function will use the features in input
microtable$otu_tableautomatically.filter_thresdefault 0; the abundance threshold, such as 0.0005 when the input is relative abundance; only available when method != "metastat". The features with abundances lower than filter_thres will be filtered.
alphadefault 0.05; significance threshold to select taxa when method is "lefse" or "rf"; or used to generate significance letters when method is 'anova' or 'KW_dunn' like the alpha parameter in
cal_diffoftrans_alphaclass.p_adjust_methoddefault "fdr"; p.adjust method; see method parameter of
p.adjustfunction for other available options; "none" means disable p value adjustment; So whenp_adjust_method = "none", P.adj is same with P.unadj.transformationdefault NULL; feature abundance transformation method in the class
trans_norm, such as 'AST' for the arc sine square root transformation. Only available whenmethodis one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "betareg" and "lme".remove_unknowndefault TRUE; whether remove unknown features that donot have clear classification information.
lefse_subgroupdefault NULL; sample sub group used for sub-comparision in lefse; Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>.
lefse_min_subsamdefault 10; sample numbers required in the subgroup test.
lefse_normdefault 1000000; scale value in lefse.
nresamdefault 0.6667; sample number ratio used in each bootstrap for method = "lefse" or "rf".
bootsdefault 30; bootstrap test number for method = "lefse" or "rf".
rf_ntreedefault 1000; see ntree in randomForest function of randomForest package when method = "rf".
group_choose_paireddefault NULL; a vector used for selecting the required groups for paired testing, only used for method = "metastat" or "metagenomeSeq".
metagenomeSeq_countdefault 1; Filter features to have at least 'counts' counts.; see the count parameter in MRcoefs function of
metagenomeSeqpackage.ALDEx2_sigdefault c("wi.eBH", "kw.eBH"); which column of the final result is used as the significance asterisk assignment; applied to method = "ALDEx2_t" or "ALDEx2_kw"; the first element is provided to "ALDEx2_t"; the second is provided to "ALDEx2_kw"; for "ALDEx2_t", the available choice is "wi.eBH" (Expected Benjamini-Hochberg corrected P value of Wilcoxon test) and "we.eBH" (Expected BH corrected P value of Welch's t test); for "ALDEx2_kw"; for "ALDEx2_t", the available choice is "kw.eBH" (Expected BH corrected P value of Kruskal-Wallace test) and "glm.eBH" (Expected BH corrected P value of glm test).
by_groupdefault NULL; a column of sample_table used to perform the differential test among groups (
groupparameter) for each group (by_groupparameter). Soby_grouphas a higher level thangroupparameter. Same with theby_groupparameter intrans_alphaclass. Only available when method is one ofc("KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare").by_IDdefault NULL; a column of sample_table used to perform paired t test or paired wilcox test for the paired data, such as the data of plant compartments for different plant species (ID). So
by_IDin sample_table should be the smallest unit of sample collection without any repetition in it. Same with theby_IDparameter in trans_alpha class.beta_pseudodefault .Machine$double.eps; the pseudo value used when the parameter
methodis'betareg'or'glmm_beta'. As the beta distribution function limits 0 < response value < 1, a pseudo value will be added for the data that equal to 0. The data that equal to 1 will be replaced by1/(1 + beta_pseudo)....parameters passed to
cal_difffunction oftrans_alphaclass when method is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "betareg", "lme", "glmm" or "glmm_beta"; passed toANCOMBC::ancombc2function when method is "ancombc2" (except tax_level, global and fix_formula parameters); passed toALDEx2::aldexfunction when method = "ALDEx2_t" or "ALDEx2_kw"; passed toDESeq2::DESeqfunction when method = "DESeq2"; passed toMicrobiomeStat::lindafunction when method = "linda"; passed totrans_env$cal_corfunction when method = "maaslin2".
Returns
res_diff and res_abund.
res_abund includes mean abundance of each taxa (Mean), standard deviation (SD), standard error (SE) and sample number (N) in the group (Group).
res_diff is the detailed differential test result depending on the method choice, may containing:
"Comparison": The groups for the comparision, maybe all groups or paired groups. If this column is not found, all groups are used;
"Group": Which group has the maximum median or mean value across the test groups;
For non-parametric methods, median value; For t.test, mean value;
"Taxa": which taxa is used in this comparision;
"Method": Test method used in the analysis depending on the method input;
"LDA" or "MeanDecreaseGini": LDA: linear discriminant score in LEfSe; MeanDecreaseGini: mean decreasing gini index in random forest;
"P.unadj": original p value;
"P.adj": adjusted p value;
"Estimate" and "Std.Error": When method is "betareg", "lm", "lme" or "glmm",
"Estimate" and "Std.Error" represent fitted coefficient and its standard error, respectively;
Others: qvalue: qvalue in metastat analysis.
Examples
\donttest{
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
}
Method plot_diff_abund()
Plot the abundance of differential taxa
Usage
trans_diff$plot_diff_abund( use_number = 1:20, color_values = RColorBrewer::brewer.pal(8, "Dark2"), select_group = NULL, select_taxa = NULL, simplify_names = TRUE, keep_prefix = TRUE, group_order = NULL, barwidth = 0.9, use_se = TRUE, add_sig = FALSE, add_sig_label = "Significance", add_sig_label_color = "black", add_sig_tip_length = 0.01, y_start = 1.01, y_increase = 0.05, text_y_size = 10, coord_flip = TRUE, xtext_angle = 45, ... )
Arguments
use_numberdefault 1:20; numeric vector; the taxa numbers (1:n) selected in the plot; If the n is larger than the number of total significant taxa, automatically use all the taxa.
color_valuesdefault
RColorBrewer::brewer.pal(8, "Dark2"); colors palette.select_groupdefault NULL; this is used to select the paired groups. This parameter is especially useful when the comparision methods is applied to paired groups; The input select_group must be one of
object$res_diff$Comparison.select_taxadefault NULL; character vector to provide taxa names. The taxa names should be same with the names shown in the plot, not the 'Taxa' column names in
object$res_diff$Taxa.simplify_namesdefault TRUE; whether use the simplified taxonomic name.
keep_prefixdefault TRUE; whether retain the taxonomic prefix.
group_orderdefault NULL; a vector to order groups, i.e. reorder the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, overlook the levels in the group of sample_table.
barwidthdefault 0.9; the bar width in plot.
use_sedefault TRUE; whether use SE in plot, if FALSE, use SD.
add_sigdefault FALSE; whether add the significance label to the plot.
add_sig_labeldefault "Significance"; select a colname of object$res_diff for the label text, such as 'P.adj' or 'Significance'.
add_sig_label_colordefault "black"; the color for the label text when add_sig = TRUE.
add_sig_tip_lengthdefault 0.01; the tip length for the added line when add_sig = TRUE.
y_startdefault 1.01; the y axis position from which to add the label; the default 1.01 means 1.01 * Value; For method != "anova", all the start positions are same, i.e. Value = max(Mean+SD or Mean+SE); For method = "anova"; the stat position is calculated for each point, i.e. Value = Mean+SD or Mean+SE.
y_increasedefault 0.05; the increasing y axia space to add label for paired groups; the default 0.05 means 0.05 * y_start * Value; In addition, this parameter is also used to label the letters of anova result with the fixed (1 + y_increase) * y_start * Value.
text_y_sizedefault 10; the size for the y axis text, i.e. feature text.
coord_flipdefault TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
xtext_angledefault 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
...parameters passed to
ggsignif::stat_signifwhen add_sig = TRUE.
Returns
ggplot.
Examples
\donttest{
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
}
Method plot_diff_bar()
Bar plot for indicator index, such as LDA score and P value.
Usage
trans_diff$plot_diff_bar( color_values = RColorBrewer::brewer.pal(8, "Dark2"), color_group_map = FALSE, use_number = 1:10, threshold = NULL, select_group = NULL, keep_full_name = FALSE, keep_prefix = TRUE, group_order = NULL, group_aggre = TRUE, group_two_sep = TRUE, coord_flip = TRUE, add_sig = FALSE, add_sig_increase = 0.1, add_sig_text_size = 5, xtext_angle = 45, xtext_size = 10, axis_text_y = 12, heatmap_cell = "P.unadj", heatmap_sig = "Significance", heatmap_x = "Factors", heatmap_y = "Taxa", heatmap_lab_fill = "P value", ... )
Arguments
color_valuesdefault
RColorBrewer::brewer.pal(8, "Dark2"); colors palette for different groups.color_group_mapdefault FALSE; whether match the colors to groups in order to fix the color in each group when part of groups are not shown in the plot. When
color_group_map = TRUE, the group_order inside the object will be used as full groups set to guide the color extraction.use_numberdefault 1:10; numeric vector; the taxa numbers used in the plot, i.e. 1:n.
thresholddefault NULL; threshold value of indicators for selecting taxa, such as 3 for LDA score of LEfSe.
select_groupdefault NULL; this is used to select the paired group when multiple comparisions are generated; The input select_group must be one of
object$res_diff$Comparison.keep_full_namedefault FALSE; whether keep the taxonomic full lineage names.
keep_prefixdefault TRUE; whether retain the taxonomic prefix, such as "g__".
group_orderdefault NULL; a vector to order the legend and colors in plot; If NULL, the function can first determine whether the group column of
microtable$sample_tableis factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group ofmicrotable$sample_table.group_aggredefault TRUE; whether aggregate the features for each group.
group_two_sepdefault TRUE; whether display the features of two groups on opposite sides of the coordinate axes when there are only two groups in total.
coord_flipdefault TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
add_sigdefault FALSE; whether add significance label (asterisk) above the bar.
add_sig_increasedefault 0.1; the axis position (
Value + add_sig_increase * max(Value)) from which to add the significance label; only available whenadd_sig = TRUE.add_sig_text_sizedefault 5; the size of added significance label; only available when
add_sig = TRUE.xtext_angledefault 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
xtext_sizedefault 10; the text size of x axis.
axis_text_ydefault 12; the size for the y axis text.
heatmap_celldefault "P.unadj"; the column of data for the cell of heatmap when formula with multiple factors is found in the method.
heatmap_sigdefault "Significance"; the column of data for the significance label of heatmap.
heatmap_xdefault "Factors"; the column of data for the x axis of heatmap.
heatmap_ydefault "Taxa"; the column of data for the y axis of heatmap.
heatmap_lab_filldefault "P value"; legend title of heatmap.
...parameters passing to
geom_barfor the bar plot orplot_corfunction intrans_envclass for the heatmap of multiple factors when formula is found in the method.
Returns
ggplot.
Examples
\donttest{
t1$plot_diff_bar(use_number = 1:20)
}
Method plot_diff_cladogram()
Plot the cladogram using taxa with significant difference.
Usage
trans_diff$plot_diff_cladogram( color = RColorBrewer::brewer.pal(8, "Dark2"), group_order = NULL, use_taxa_num = 200, filter_taxa = NULL, use_feature_num = NULL, clade_label_level = 4, select_show_labels = NULL, only_select_show = FALSE, sep = "|", branch_size = 0.2, alpha = 0.2, clade_label_size = 2, clade_label_size_add = 5, clade_label_size_log = exp(1), node_size_scale = 1, node_size_offset = 1, annotation_shape = 22, annotation_shape_size = 5 )
Arguments
colordefault
RColorBrewer::brewer.pal(8, "Dark2"); color palette used in the plot.group_orderdefault NULL; a vector to order the legend in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group of sample_table. If the number of provided group_order is less than the number of groups in
res_diff$Group, the function will select the groups of group_order automatically.use_taxa_numdefault 200; integer; The taxa number used in the background tree plot; select the taxa according to the mean abundance .
filter_taxadefault NULL; The mean relative abundance used to filter the taxa with low abundance.
use_feature_numdefault NULL; integer; The feature number used in the plot; select the features according to the LDA score (method = "lefse") or MeanDecreaseGini (method = "rf") from high to low.
clade_label_leveldefault 4; the taxonomic level for marking the label with letters, root is the largest.
select_show_labelsdefault NULL; character vector; The features to show in the plot with full label names, not the letters.
only_select_showdefault FALSE; whether only use the the select features in the parameter
select_show_labels.sepdefault "|"; the seperate character in the taxonomic information.
branch_sizedefault 0.2; numberic, size of branch.
alphadefault 0.2; shading of the color.
clade_label_sizedefault 2; basic size for the clade label; please also see
clade_label_size_addandclade_label_size_log.clade_label_size_adddefault 5; added basic size for the clade label; see the formula in
clade_label_size_logparameter.clade_label_size_logdefault
exp(1); the base oflogfunction for added size of the clade label; the size formula:clade_label_size + log(clade_label_level + clade_label_size_add, base = clade_label_size_log); so useclade_label_size_log,clade_label_size_addandclade_label_sizecan totally control the label size for different taxonomic levels.node_size_scaledefault 1; scale for the node size.
node_size_offsetdefault 1; offset for the node size.
annotation_shapedefault 22; shape used in the annotation legend.
annotation_shape_sizedefault 5; size used in the annotation legend.
Returns
ggplot.
Examples
\donttest{
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)
}
Method print()
Print the trans_alpha object.
Usage
trans_diff$print()
Method clone()
The objects of this class are cloneable with this method.
Usage
trans_diff$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
## ------------------------------------------------
## Method `trans_diff$new`
## ------------------------------------------------
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
## ------------------------------------------------
## Method `trans_diff$plot_diff_abund`
## ------------------------------------------------
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
## ------------------------------------------------
## Method `trans_diff$plot_diff_bar`
## ------------------------------------------------
t1$plot_diff_bar(use_number = 1:20)
## ------------------------------------------------
## Method `trans_diff$plot_diff_cladogram`
## ------------------------------------------------
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)