trans_diff {microeco} | R Documentation |
Create trans_diff
object for the differential analysis on the taxonomic abundance
Description
This class is a wrapper for a series of differential abundance test and indicator analysis methods, including
LEfSe based on the Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>,
random forest <doi:10.1016/j.geoderma.2018.09.035>, metastat based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>,
non-parametric Kruskal-Wallis Rank Sum Test,
Dunn's Kruskal-Wallis Multiple Comparisons based on the FSA
package, Wilcoxon Rank Sum and Signed Rank Tests, t-test, anova,
Scheirer Ray Hare test,
R package metagenomeSeq
Paulson et al. (2013) <doi:10.1038/nmeth.2658>,
R package ANCOMBC
<doi:10.1038/s41467-020-17041-7>, R package ALDEx2
<doi:10.1371/journal.pone.0067019; 10.1186/2049-2618-2-15>,
R package MicrobiomeStat
<doi:10.1186/s13059-022-02655-5>, beta regression <doi:10.18637/jss.v034.i02>, R package maaslin2
,
linear mixed-effects model and generalized linear mixed model.
Methods
Public methods
Method new()
Usage
trans_diff$new( dataset = NULL, method = c("lefse", "rf", "metastat", "metagenomeSeq", "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "lm", "ancombc2", "ALDEx2_t", "ALDEx2_kw", "DESeq2", "edgeR", "linda", "maaslin2", "betareg", "lme", "glmm", "glmm_beta")[1], group = NULL, taxa_level = "all", filter_thres = 0, alpha = 0.05, p_adjust_method = "fdr", transformation = NULL, remove_unknown = TRUE, lefse_subgroup = NULL, lefse_min_subsam = 10, lefse_norm = 1e+06, nresam = 0.6667, boots = 30, rf_ntree = 1000, group_choose_paired = NULL, metagenomeSeq_count = 1, ALDEx2_sig = c("wi.eBH", "kw.eBH"), by_group = NULL, by_ID = NULL, beta_pseudo = .Machine$double.eps, ... )
Arguments
dataset
default NULL;
microtable
object.method
default "lefse"; see the following available options:
- 'lefse'
LEfSe method based on Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>
- 'rf'
random forest and non-parametric test method based on An et al. (2019) <doi:10.1016/j.geoderma.2018.09.035>
- 'metastat'
Metastat method for all paired groups based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>
- 'metagenomeSeq'
zero-inflated log-normal model-based differential test method from
metagenomeSeq
package.- 'KW'
KW: Kruskal-Wallis Rank Sum Test for all groups (>= 2)
- 'KW_dunn'
Dunn's Kruskal-Wallis Multiple Comparisons when group number > 2; see dunnTest function in
FSA
package- 'wilcox'
Wilcoxon Rank Sum and Signed Rank Tests for all paired groups
- 't.test'
Student's t-Test for all paired groups
- 'anova'
ANOVA for one-way or multi-factor analysis; see
cal_diff
function oftrans_alpha
class- 'scheirerRayHare'
Scheirer Ray Hare test for nonparametric test used for a two-way factorial experiment; see
scheirerRayHare
function ofrcompanion
package- 'lm'
Linear Model based on the
lm
function- 'ALDEx2_t'
runs Welch's t and Wilcoxon tests with
ALDEx2
package; see also the test parameter inALDEx2::aldex
function; ALDEx2 uses the centred log-ratio (clr) transformation and estimates per-feature technical variation within each sample using Monte-Carlo instances drawn from the Dirichlet distribution; Reference: <doi:10.1371/journal.pone.0067019> and <doi:10.1186/2049-2618-2-15>; requireALDEx2
package to be installed (https://bioconductor.org/packages/release/bioc/html/ALDEx2.html)- 'ALDEx2_kw'
runs Kruskal-Wallace and generalized linear model (glm) test with
ALDEx2
package; see also thetest
parameter inALDEx2::aldex
function.- 'DESeq2'
Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution based on the
DESeq2
package.- 'edgeR'
The
exactTest
method ofedgeR
package is implemented.- 'ancombc2'
Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC) based on the
ancombc2
function fromANCOMBC
package. If thefix_formula
parameter is not provided, the function can automatically assign it by using group parameter. For this method, thegroup
parameter is directly passed to the group parameter ofancombc2
function. Reference: <doi:10.1038/s41467-020-17041-7><10.1038/s41592-023-02092-7>; RequireANCOMBC
package to be installed (https://bioconductor.org/packages/release/bioc/html/ANCOMBC.html)- 'linda'
Linear Model for Differential Abundance Analysis of High-dimensional Compositional Data based on the
linda
function ofMicrobiomeStat
package. For linda method, please provide either the group parameter or the formula parameter. When the formula parameter is provided, it should start with '~' as it is directly used by the linda function. If the group parameter is used, the prefix '~' is not necessary as the function can automatically add it. The parameterfeature.dat.type = 'count'
has been fixed. Other parameters can be passed to thelinda
function. Reference: <doi:10.1186/s13059-022-02655-5>- 'maaslin2'
finding associations between metadata and potentially high-dimensional microbial multi-omics data based on the Maaslin2 package. Using this option can invoke the
trans_env$cal_cor
function withcor_method = "maaslin2"
.- 'betareg'
Beta Regression based on the
betareg
package. Please see thebeta_pseudo
parameter for the use of pseudo value when there is 0 or 1 in the data- 'lme'
Linear Mixed Effect Model based on the
lmerTest
package. In the return table, the significance of fixed factors are tested by functionanova
. The significance of 'Estimate' in each term of fixed factors comes from the model.- 'glmm'
Generalized linear mixed model (GLMM) based on the
glmmTMB
package. Theformula
andfamily
parameters are needed. Please refer to glmmTMB package to select the family function, e.g.family = glmmTMB::lognormal(link = "log")
. The usage of formula is similar with that in 'lme' method. For more available parameters, please seeglmmTMB::glmmTMB
function and use parameter passing. In the return table, Conditional_R2 and Marginal_R2 represent total variance (explained by both fixed and random effects) and the variance explained by fixed effects, respectively. The significance of fixed factors are tested by Chi-square test from functioncar::Anova
. The significance of 'Estimate' in each term of fixed factors comes from the model.- 'glmm_beta'
Generalized linear mixed model with a family function of beta distribution, developed for the relative abundance (ranging from 0 to 1) of taxa specifically. This is an extension of the GLMM model in
'glmm'
option. The only difference is inglmm_beta
the family function is fixed with the beta distribution function, i.e.family = glmmTMB::beta_family(link = "logit")
. Please see thebeta_pseudo
parameter for the use of pseudo value when there is 0 or 1 in the data
group
default NULL; sample group used for the comparision; a colname of input
microtable$sample_table
; It is necessary when method is not "anova" or method is "anova" but formula is not provided. Once group is provided, the return res_abund will have mean and sd values for group.taxa_level
default "all"; 'all' represents using abundance data at all taxonomic ranks; For testing at a specific rank, provide taxonomic rank name, such as "Genus". If the provided taxonomic name is neither 'all' nor a colname in tax_table of input dataset, the function will use the features in input
microtable$otu_table
automatically.filter_thres
default 0; the abundance threshold, such as 0.0005 when the input is relative abundance; only available when method != "metastat". The features with abundances lower than filter_thres will be filtered.
alpha
default 0.05; significance threshold to select taxa when method is "lefse" or "rf"; or used to generate significance letters when method is 'anova' or 'KW_dunn' like the alpha parameter in
cal_diff
oftrans_alpha
class.p_adjust_method
default "fdr"; p.adjust method; see method parameter of
p.adjust
function for other available options; "none" means disable p value adjustment; So whenp_adjust_method = "none"
, P.adj is same with P.unadj.transformation
default NULL; feature abundance transformation method in the class
trans_norm
, such as 'AST' for the arc sine square root transformation. Only available whenmethod
is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "betareg" and "lme".remove_unknown
default TRUE; whether remove unknown features that donot have clear classification information.
lefse_subgroup
default NULL; sample sub group used for sub-comparision in lefse; Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>.
lefse_min_subsam
default 10; sample numbers required in the subgroup test.
lefse_norm
default 1000000; scale value in lefse.
nresam
default 0.6667; sample number ratio used in each bootstrap for method = "lefse" or "rf".
boots
default 30; bootstrap test number for method = "lefse" or "rf".
rf_ntree
default 1000; see ntree in randomForest function of randomForest package when method = "rf".
group_choose_paired
default NULL; a vector used for selecting the required groups for paired testing, only used for method = "metastat" or "metagenomeSeq".
metagenomeSeq_count
default 1; Filter features to have at least 'counts' counts.; see the count parameter in MRcoefs function of
metagenomeSeq
package.ALDEx2_sig
default c("wi.eBH", "kw.eBH"); which column of the final result is used as the significance asterisk assignment; applied to method = "ALDEx2_t" or "ALDEx2_kw"; the first element is provided to "ALDEx2_t"; the second is provided to "ALDEx2_kw"; for "ALDEx2_t", the available choice is "wi.eBH" (Expected Benjamini-Hochberg corrected P value of Wilcoxon test) and "we.eBH" (Expected BH corrected P value of Welch's t test); for "ALDEx2_kw"; for "ALDEx2_t", the available choice is "kw.eBH" (Expected BH corrected P value of Kruskal-Wallace test) and "glm.eBH" (Expected BH corrected P value of glm test).
by_group
default NULL; a column of sample_table used to perform the differential test among groups (
group
parameter) for each group (by_group
parameter). Soby_group
has a higher level thangroup
parameter. Same with theby_group
parameter intrans_alpha
class. Only available when method is one ofc("KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare")
.by_ID
default NULL; a column of sample_table used to perform paired t test or paired wilcox test for the paired data, such as the data of plant compartments for different plant species (ID). So
by_ID
in sample_table should be the smallest unit of sample collection without any repetition in it. Same with theby_ID
parameter in trans_alpha class.beta_pseudo
default .Machine$double.eps; the pseudo value used when the parameter
method
is'betareg'
or'glmm_beta'
. As the beta distribution function limits 0 < response value < 1, a pseudo value will be added for the data that equal to 0. The data that equal to 1 will be replaced by1/(1 + beta_pseudo)
....
parameters passed to
cal_diff
function oftrans_alpha
class when method is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "betareg", "lme", "glmm" or "glmm_beta"; passed toANCOMBC::ancombc2
function when method is "ancombc2" (except tax_level, global and fix_formula parameters); passed toALDEx2::aldex
function when method = "ALDEx2_t" or "ALDEx2_kw"; passed toDESeq2::DESeq
function when method = "DESeq2"; passed toMicrobiomeStat::linda
function when method = "linda"; passed totrans_env$cal_cor
function when method = "maaslin2".
Returns
res_diff and res_abund.
res_abund includes mean abundance of each taxa (Mean), standard deviation (SD), standard error (SE) and sample number (N) in the group (Group).
res_diff is the detailed differential test result depending on the method choice, may containing:
"Comparison": The groups for the comparision, maybe all groups or paired groups. If this column is not found, all groups are used;
"Group": Which group has the maximum median or mean value across the test groups;
For non-parametric methods, median value; For t.test, mean value;
"Taxa": which taxa is used in this comparision;
"Method": Test method used in the analysis depending on the method input;
"LDA" or "MeanDecreaseGini": LDA: linear discriminant score in LEfSe; MeanDecreaseGini: mean decreasing gini index in random forest;
"P.unadj": original p value;
"P.adj": adjusted p value;
"Estimate" and "Std.Error": When method is "betareg", "lm", "lme" or "glmm",
"Estimate" and "Std.Error" represent fitted coefficient and its standard error, respectively;
Others: qvalue: qvalue in metastat analysis.
Examples
\donttest{ data(dataset) t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group") t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group") t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus") t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group") }
Method plot_diff_abund()
Plot the abundance of differential taxa
Usage
trans_diff$plot_diff_abund( use_number = 1:20, color_values = RColorBrewer::brewer.pal(8, "Dark2"), select_group = NULL, select_taxa = NULL, simplify_names = TRUE, keep_prefix = TRUE, group_order = NULL, barwidth = 0.9, use_se = TRUE, add_sig = FALSE, add_sig_label = "Significance", add_sig_label_color = "black", add_sig_tip_length = 0.01, y_start = 1.01, y_increase = 0.05, text_y_size = 10, coord_flip = TRUE, xtext_angle = 45, ... )
Arguments
use_number
default 1:20; numeric vector; the taxa numbers (1:n) selected in the plot; If the n is larger than the number of total significant taxa, automatically use all the taxa.
color_values
default
RColorBrewer::brewer.pal
(8, "Dark2"); colors palette.select_group
default NULL; this is used to select the paired groups. This parameter is especially useful when the comparision methods is applied to paired groups; The input select_group must be one of
object$res_diff$Comparison
.select_taxa
default NULL; character vector to provide taxa names. The taxa names should be same with the names shown in the plot, not the 'Taxa' column names in
object$res_diff$Taxa
.simplify_names
default TRUE; whether use the simplified taxonomic name.
keep_prefix
default TRUE; whether retain the taxonomic prefix.
group_order
default NULL; a vector to order groups, i.e. reorder the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, overlook the levels in the group of sample_table.
barwidth
default 0.9; the bar width in plot.
use_se
default TRUE; whether use SE in plot, if FALSE, use SD.
add_sig
default FALSE; whether add the significance label to the plot.
add_sig_label
default "Significance"; select a colname of object$res_diff for the label text, such as 'P.adj' or 'Significance'.
add_sig_label_color
default "black"; the color for the label text when add_sig = TRUE.
add_sig_tip_length
default 0.01; the tip length for the added line when add_sig = TRUE.
y_start
default 1.01; the y axis position from which to add the label; the default 1.01 means 1.01 * Value; For method != "anova", all the start positions are same, i.e. Value = max(Mean+SD or Mean+SE); For method = "anova"; the stat position is calculated for each point, i.e. Value = Mean+SD or Mean+SE.
y_increase
default 0.05; the increasing y axia space to add label for paired groups; the default 0.05 means 0.05 * y_start * Value; In addition, this parameter is also used to label the letters of anova result with the fixed (1 + y_increase) * y_start * Value.
text_y_size
default 10; the size for the y axis text, i.e. feature text.
coord_flip
default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
xtext_angle
default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
...
parameters passed to
ggsignif::stat_signif
when add_sig = TRUE.
Returns
ggplot.
Examples
\donttest{ t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus") t1$plot_diff_abund(use_number = 1:10) t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE) t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group") t1$plot_diff_abund(use_number = 1:20) t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE) t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group") t1$plot_diff_abund(use_number = 1:20) t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE) }
Method plot_diff_bar()
Bar plot for indicator index, such as LDA score and P value.
Usage
trans_diff$plot_diff_bar( color_values = RColorBrewer::brewer.pal(8, "Dark2"), color_group_map = FALSE, use_number = 1:10, threshold = NULL, select_group = NULL, keep_full_name = FALSE, keep_prefix = TRUE, group_order = NULL, group_aggre = TRUE, group_two_sep = TRUE, coord_flip = TRUE, add_sig = FALSE, add_sig_increase = 0.1, add_sig_text_size = 5, xtext_angle = 45, xtext_size = 10, axis_text_y = 12, heatmap_cell = "P.unadj", heatmap_sig = "Significance", heatmap_x = "Factors", heatmap_y = "Taxa", heatmap_lab_fill = "P value", ... )
Arguments
color_values
default
RColorBrewer::brewer.pal
(8, "Dark2"); colors palette for different groups.color_group_map
default FALSE; whether match the colors to groups in order to fix the color in each group when part of groups are not shown in the plot. When
color_group_map = TRUE
, the group_order inside the object will be used as full groups set to guide the color extraction.use_number
default 1:10; numeric vector; the taxa numbers used in the plot, i.e. 1:n.
threshold
default NULL; threshold value of indicators for selecting taxa, such as 3 for LDA score of LEfSe.
select_group
default NULL; this is used to select the paired group when multiple comparisions are generated; The input select_group must be one of
object$res_diff$Comparison
.keep_full_name
default FALSE; whether keep the taxonomic full lineage names.
keep_prefix
default TRUE; whether retain the taxonomic prefix, such as "g__".
group_order
default NULL; a vector to order the legend and colors in plot; If NULL, the function can first determine whether the group column of
microtable$sample_table
is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group ofmicrotable$sample_table
.group_aggre
default TRUE; whether aggregate the features for each group.
group_two_sep
default TRUE; whether display the features of two groups on opposite sides of the coordinate axes when there are only two groups in total.
coord_flip
default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.
add_sig
default FALSE; whether add significance label (asterisk) above the bar.
add_sig_increase
default 0.1; the axis position (
Value + add_sig_increase * max(Value)
) from which to add the significance label; only available whenadd_sig = TRUE
.add_sig_text_size
default 5; the size of added significance label; only available when
add_sig = TRUE
.xtext_angle
default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.
xtext_size
default 10; the text size of x axis.
axis_text_y
default 12; the size for the y axis text.
heatmap_cell
default "P.unadj"; the column of data for the cell of heatmap when formula with multiple factors is found in the method.
heatmap_sig
default "Significance"; the column of data for the significance label of heatmap.
heatmap_x
default "Factors"; the column of data for the x axis of heatmap.
heatmap_y
default "Taxa"; the column of data for the y axis of heatmap.
heatmap_lab_fill
default "P value"; legend title of heatmap.
...
parameters passing to
geom_bar
for the bar plot orplot_cor
function intrans_env
class for the heatmap of multiple factors when formula is found in the method.
Returns
ggplot.
Examples
\donttest{ t1$plot_diff_bar(use_number = 1:20) }
Method plot_diff_cladogram()
Plot the cladogram using taxa with significant difference.
Usage
trans_diff$plot_diff_cladogram( color = RColorBrewer::brewer.pal(8, "Dark2"), group_order = NULL, use_taxa_num = 200, filter_taxa = NULL, use_feature_num = NULL, clade_label_level = 4, select_show_labels = NULL, only_select_show = FALSE, sep = "|", branch_size = 0.2, alpha = 0.2, clade_label_size = 2, clade_label_size_add = 5, clade_label_size_log = exp(1), node_size_scale = 1, node_size_offset = 1, annotation_shape = 22, annotation_shape_size = 5 )
Arguments
color
default
RColorBrewer::brewer.pal
(8, "Dark2"); color palette used in the plot.group_order
default NULL; a vector to order the legend in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group of sample_table. If the number of provided group_order is less than the number of groups in
res_diff$Group
, the function will select the groups of group_order automatically.use_taxa_num
default 200; integer; The taxa number used in the background tree plot; select the taxa according to the mean abundance .
filter_taxa
default NULL; The mean relative abundance used to filter the taxa with low abundance.
use_feature_num
default NULL; integer; The feature number used in the plot; select the features according to the LDA score (method = "lefse") or MeanDecreaseGini (method = "rf") from high to low.
clade_label_level
default 4; the taxonomic level for marking the label with letters, root is the largest.
select_show_labels
default NULL; character vector; The features to show in the plot with full label names, not the letters.
only_select_show
default FALSE; whether only use the the select features in the parameter
select_show_labels
.sep
default "|"; the seperate character in the taxonomic information.
branch_size
default 0.2; numberic, size of branch.
alpha
default 0.2; shading of the color.
clade_label_size
default 2; basic size for the clade label; please also see
clade_label_size_add
andclade_label_size_log
.clade_label_size_add
default 5; added basic size for the clade label; see the formula in
clade_label_size_log
parameter.clade_label_size_log
default
exp(1)
; the base oflog
function for added size of the clade label; the size formula:clade_label_size + log(clade_label_level + clade_label_size_add, base = clade_label_size_log)
; so useclade_label_size_log
,clade_label_size_add
andclade_label_size
can totally control the label size for different taxonomic levels.node_size_scale
default 1; scale for the node size.
node_size_offset
default 1; offset for the node size.
annotation_shape
default 22; shape used in the annotation legend.
annotation_shape_size
default 5; size used in the annotation legend.
Returns
ggplot.
Examples
\donttest{ t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL) }
Method print()
Print the trans_alpha object.
Usage
trans_diff$print()
Method clone()
The objects of this class are cloneable with this method.
Usage
trans_diff$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
## ------------------------------------------------
## Method `trans_diff$new`
## ------------------------------------------------
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
## ------------------------------------------------
## Method `trans_diff$plot_diff_abund`
## ------------------------------------------------
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
## ------------------------------------------------
## Method `trans_diff$plot_diff_bar`
## ------------------------------------------------
t1$plot_diff_bar(use_number = 1:20)
## ------------------------------------------------
## Method `trans_diff$plot_diff_cladogram`
## ------------------------------------------------
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)