trans_diff {microeco}R Documentation

Create trans_diff object for the differential analysis on the taxonomic abundance

Description

This class is a wrapper for a series of differential abundance test and indicator analysis methods, including LEfSe based on the Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>, random forest <doi:10.1016/j.geoderma.2018.09.035>, metastat based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>, non-parametric Kruskal-Wallis Rank Sum Test, Dunn's Kruskal-Wallis Multiple Comparisons based on the FSA package, Wilcoxon Rank Sum and Signed Rank Tests, t-test, anova, Scheirer Ray Hare test, R package metagenomeSeq Paulson et al. (2013) <doi:10.1038/nmeth.2658>, R package ANCOMBC <doi:10.1038/s41467-020-17041-7>, R package ALDEx2 <doi:10.1371/journal.pone.0067019; 10.1186/2049-2618-2-15>, R package MicrobiomeStat <doi:10.1186/s13059-022-02655-5>, beta regression <doi:10.18637/jss.v034.i02>, R package maaslin2, linear mixed-effects model and generalized linear mixed model.

Methods

Public methods


Method new()

Usage
trans_diff$new(
  dataset = NULL,
  method = c("lefse", "rf", "metastat", "metagenomeSeq", "KW", "KW_dunn", "wilcox",
    "t.test", "anova", "scheirerRayHare", "lm", "ancombc2", "ALDEx2_t", "ALDEx2_kw",
    "DESeq2", "edgeR", "linda", "maaslin2", "betareg", "lme", "glmm", "glmm_beta")[1],
  group = NULL,
  taxa_level = "all",
  filter_thres = 0,
  alpha = 0.05,
  p_adjust_method = "fdr",
  transformation = NULL,
  remove_unknown = TRUE,
  lefse_subgroup = NULL,
  lefse_min_subsam = 10,
  lefse_norm = 1e+06,
  nresam = 0.6667,
  boots = 30,
  rf_ntree = 1000,
  group_choose_paired = NULL,
  metagenomeSeq_count = 1,
  ALDEx2_sig = c("wi.eBH", "kw.eBH"),
  by_group = NULL,
  by_ID = NULL,
  beta_pseudo = .Machine$double.eps,
  ...
)
Arguments
dataset

default NULL; microtable object.

method

default "lefse"; see the following available options:

'lefse'

LEfSe method based on Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>

'rf'

random forest and non-parametric test method based on An et al. (2019) <doi:10.1016/j.geoderma.2018.09.035>

'metastat'

Metastat method for all paired groups based on White et al. (2009) <doi:10.1371/journal.pcbi.1000352>

'metagenomeSeq'

zero-inflated log-normal model-based differential test method from metagenomeSeq package.

'KW'

KW: Kruskal-Wallis Rank Sum Test for all groups (>= 2)

'KW_dunn'

Dunn's Kruskal-Wallis Multiple Comparisons when group number > 2; see dunnTest function in FSA package

'wilcox'

Wilcoxon Rank Sum and Signed Rank Tests for all paired groups

't.test'

Student's t-Test for all paired groups

'anova'

ANOVA for one-way or multi-factor analysis; see cal_diff function of trans_alpha class

'scheirerRayHare'

Scheirer Ray Hare test for nonparametric test used for a two-way factorial experiment; see scheirerRayHare function of rcompanion package

'lm'

Linear Model based on the lm function

'ALDEx2_t'

runs Welch's t and Wilcoxon tests with ALDEx2 package; see also the test parameter in ALDEx2::aldex function; ALDEx2 uses the centred log-ratio (clr) transformation and estimates per-feature technical variation within each sample using Monte-Carlo instances drawn from the Dirichlet distribution; Reference: <doi:10.1371/journal.pone.0067019> and <doi:10.1186/2049-2618-2-15>; require ALDEx2 package to be installed (https://bioconductor.org/packages/release/bioc/html/ALDEx2.html)

'ALDEx2_kw'

runs Kruskal-Wallace and generalized linear model (glm) test with ALDEx2 package; see also the test parameter in ALDEx2::aldex function.

'DESeq2'

Differential expression analysis based on the Negative Binomial (a.k.a. Gamma-Poisson) distribution based on the DESeq2 package.

'edgeR'

The exactTest method of edgeR package is implemented.

'ancombc2'

Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC) based on the ancombc2 function from ANCOMBC package. If the fix_formula parameter is not provided, the function can automatically assign it by using group parameter. For this method, the group parameter is directly passed to the group parameter of ancombc2 function. Reference: <doi:10.1038/s41467-020-17041-7><10.1038/s41592-023-02092-7>; Require ANCOMBC package to be installed (https://bioconductor.org/packages/release/bioc/html/ANCOMBC.html)

'linda'

Linear Model for Differential Abundance Analysis of High-dimensional Compositional Data based on the linda function of MicrobiomeStat package. For linda method, please provide either the group parameter or the formula parameter. When the formula parameter is provided, it should start with '~' as it is directly used by the linda function. If the group parameter is used, the prefix '~' is not necessary as the function can automatically add it. The parameter feature.dat.type = 'count' has been fixed. Other parameters can be passed to the linda function. Reference: <doi:10.1186/s13059-022-02655-5>

'maaslin2'

finding associations between metadata and potentially high-dimensional microbial multi-omics data based on the Maaslin2 package. Using this option can invoke the trans_env$cal_cor function with cor_method = "maaslin2".

'betareg'

Beta Regression based on the betareg package. Please see the beta_pseudo parameter for the use of pseudo value when there is 0 or 1 in the data

'lme'

Linear Mixed Effect Model based on the lmerTest package. In the return table, the significance of fixed factors are tested by function anova. The significance of 'Estimate' in each term of fixed factors comes from the model.

'glmm'

Generalized linear mixed model (GLMM) based on the glmmTMB package. The formula and family parameters are needed. Please refer to glmmTMB package to select the family function, e.g. family = glmmTMB::lognormal(link = "log"). The usage of formula is similar with that in 'lme' method. For more available parameters, please see glmmTMB::glmmTMB function and use parameter passing. In the return table, Conditional_R2 and Marginal_R2 represent total variance (explained by both fixed and random effects) and the variance explained by fixed effects, respectively. The significance of fixed factors are tested by Chi-square test from function car::Anova. The significance of 'Estimate' in each term of fixed factors comes from the model.

'glmm_beta'

Generalized linear mixed model with a family function of beta distribution, developed for the relative abundance (ranging from 0 to 1) of taxa specifically. This is an extension of the GLMM model in 'glmm' option. The only difference is in glmm_beta the family function is fixed with the beta distribution function, i.e. family = glmmTMB::beta_family(link = "logit"). Please see the beta_pseudo parameter for the use of pseudo value when there is 0 or 1 in the data

group

default NULL; sample group used for the comparision; a colname of input microtable$sample_table; It is necessary when method is not "anova" or method is "anova" but formula is not provided. Once group is provided, the return res_abund will have mean and sd values for group.

taxa_level

default "all"; 'all' represents using abundance data at all taxonomic ranks; For testing at a specific rank, provide taxonomic rank name, such as "Genus". If the provided taxonomic name is neither 'all' nor a colname in tax_table of input dataset, the function will use the features in input microtable$otu_table automatically.

filter_thres

default 0; the abundance threshold, such as 0.0005 when the input is relative abundance; only available when method != "metastat". The features with abundances lower than filter_thres will be filtered.

alpha

default 0.05; significance threshold to select taxa when method is "lefse" or "rf"; or used to generate significance letters when method is 'anova' or 'KW_dunn' like the alpha parameter in cal_diff of trans_alpha class.

p_adjust_method

default "fdr"; p.adjust method; see method parameter of p.adjust function for other available options; "none" means disable p value adjustment; So when p_adjust_method = "none", P.adj is same with P.unadj.

transformation

default NULL; feature abundance transformation method in the class trans_norm, such as 'AST' for the arc sine square root transformation. Only available when method is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare", "betareg" and "lme".

remove_unknown

default TRUE; whether remove unknown features that donot have clear classification information.

lefse_subgroup

default NULL; sample sub group used for sub-comparision in lefse; Segata et al. (2011) <doi:10.1186/gb-2011-12-6-r60>.

lefse_min_subsam

default 10; sample numbers required in the subgroup test.

lefse_norm

default 1000000; scale value in lefse.

nresam

default 0.6667; sample number ratio used in each bootstrap for method = "lefse" or "rf".

boots

default 30; bootstrap test number for method = "lefse" or "rf".

rf_ntree

default 1000; see ntree in randomForest function of randomForest package when method = "rf".

group_choose_paired

default NULL; a vector used for selecting the required groups for paired testing, only used for method = "metastat" or "metagenomeSeq".

metagenomeSeq_count

default 1; Filter features to have at least 'counts' counts.; see the count parameter in MRcoefs function of metagenomeSeq package.

ALDEx2_sig

default c("wi.eBH", "kw.eBH"); which column of the final result is used as the significance asterisk assignment; applied to method = "ALDEx2_t" or "ALDEx2_kw"; the first element is provided to "ALDEx2_t"; the second is provided to "ALDEx2_kw"; for "ALDEx2_t", the available choice is "wi.eBH" (Expected Benjamini-Hochberg corrected P value of Wilcoxon test) and "we.eBH" (Expected BH corrected P value of Welch's t test); for "ALDEx2_kw"; for "ALDEx2_t", the available choice is "kw.eBH" (Expected BH corrected P value of Kruskal-Wallace test) and "glm.eBH" (Expected BH corrected P value of glm test).

by_group

default NULL; a column of sample_table used to perform the differential test among groups (group parameter) for each group (by_group parameter). So by_group has a higher level than group parameter. Same with the by_group parameter in trans_alpha class. Only available when method is one of c("KW", "KW_dunn", "wilcox", "t.test", "anova", "scheirerRayHare").

by_ID

default NULL; a column of sample_table used to perform paired t test or paired wilcox test for the paired data, such as the data of plant compartments for different plant species (ID). So by_ID in sample_table should be the smallest unit of sample collection without any repetition in it. Same with the by_ID parameter in trans_alpha class.

beta_pseudo

default .Machine$double.eps; the pseudo value used when the parameter method is 'betareg' or 'glmm_beta'. As the beta distribution function limits 0 < response value < 1, a pseudo value will be added for the data that equal to 0. The data that equal to 1 will be replaced by 1/(1 + beta_pseudo).

...

parameters passed to cal_diff function of trans_alpha class when method is one of "KW", "KW_dunn", "wilcox", "t.test", "anova", "betareg", "lme", "glmm" or "glmm_beta"; passed to ANCOMBC::ancombc2 function when method is "ancombc2" (except tax_level, global and fix_formula parameters); passed to ALDEx2::aldex function when method = "ALDEx2_t" or "ALDEx2_kw"; passed to DESeq2::DESeq function when method = "DESeq2"; passed to MicrobiomeStat::linda function when method = "linda"; passed to trans_env$cal_cor function when method = "maaslin2".

Returns

res_diff and res_abund.
res_abund includes mean abundance of each taxa (Mean), standard deviation (SD), standard error (SE) and sample number (N) in the group (Group).
res_diff is the detailed differential test result depending on the method choice, may containing:
"Comparison": The groups for the comparision, maybe all groups or paired groups. If this column is not found, all groups are used;
"Group": Which group has the maximum median or mean value across the test groups; For non-parametric methods, median value; For t.test, mean value;
"Taxa": which taxa is used in this comparision;
"Method": Test method used in the analysis depending on the method input;
"LDA" or "MeanDecreaseGini": LDA: linear discriminant score in LEfSe; MeanDecreaseGini: mean decreasing gini index in random forest;
"P.unadj": original p value;
"P.adj": adjusted p value;
"Estimate" and "Std.Error": When method is "betareg", "lm", "lme" or "glmm", "Estimate" and "Std.Error" represent fitted coefficient and its standard error, respectively;
Others: qvalue: qvalue in metastat analysis.

Examples
\donttest{
data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
}

Method plot_diff_abund()

Plot the abundance of differential taxa

Usage
trans_diff$plot_diff_abund(
  use_number = 1:20,
  color_values = RColorBrewer::brewer.pal(8, "Dark2"),
  select_group = NULL,
  select_taxa = NULL,
  simplify_names = TRUE,
  keep_prefix = TRUE,
  group_order = NULL,
  barwidth = 0.9,
  use_se = TRUE,
  add_sig = FALSE,
  add_sig_label = "Significance",
  add_sig_label_color = "black",
  add_sig_tip_length = 0.01,
  y_start = 1.01,
  y_increase = 0.05,
  text_y_size = 10,
  coord_flip = TRUE,
  xtext_angle = 45,
  ...
)
Arguments
use_number

default 1:20; numeric vector; the taxa numbers (1:n) selected in the plot; If the n is larger than the number of total significant taxa, automatically use all the taxa.

color_values

default RColorBrewer::brewer.pal(8, "Dark2"); colors palette.

select_group

default NULL; this is used to select the paired groups. This parameter is especially useful when the comparision methods is applied to paired groups; The input select_group must be one of object$res_diff$Comparison.

select_taxa

default NULL; character vector to provide taxa names. The taxa names should be same with the names shown in the plot, not the 'Taxa' column names in object$res_diff$Taxa.

simplify_names

default TRUE; whether use the simplified taxonomic name.

keep_prefix

default TRUE; whether retain the taxonomic prefix.

group_order

default NULL; a vector to order groups, i.e. reorder the legend and colors in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, overlook the levels in the group of sample_table.

barwidth

default 0.9; the bar width in plot.

use_se

default TRUE; whether use SE in plot, if FALSE, use SD.

add_sig

default FALSE; whether add the significance label to the plot.

add_sig_label

default "Significance"; select a colname of object$res_diff for the label text, such as 'P.adj' or 'Significance'.

add_sig_label_color

default "black"; the color for the label text when add_sig = TRUE.

add_sig_tip_length

default 0.01; the tip length for the added line when add_sig = TRUE.

y_start

default 1.01; the y axis position from which to add the label; the default 1.01 means 1.01 * Value; For method != "anova", all the start positions are same, i.e. Value = max(Mean+SD or Mean+SE); For method = "anova"; the stat position is calculated for each point, i.e. Value = Mean+SD or Mean+SE.

y_increase

default 0.05; the increasing y axia space to add label for paired groups; the default 0.05 means 0.05 * y_start * Value; In addition, this parameter is also used to label the letters of anova result with the fixed (1 + y_increase) * y_start * Value.

text_y_size

default 10; the size for the y axis text, i.e. feature text.

coord_flip

default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.

xtext_angle

default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.

...

parameters passed to ggsignif::stat_signif when add_sig = TRUE.

Returns

ggplot.

Examples
\donttest{
t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
}

Method plot_diff_bar()

Bar plot for indicator index, such as LDA score and P value.

Usage
trans_diff$plot_diff_bar(
  color_values = RColorBrewer::brewer.pal(8, "Dark2"),
  color_group_map = FALSE,
  use_number = 1:10,
  threshold = NULL,
  select_group = NULL,
  keep_full_name = FALSE,
  keep_prefix = TRUE,
  group_order = NULL,
  group_aggre = TRUE,
  group_two_sep = TRUE,
  coord_flip = TRUE,
  add_sig = FALSE,
  add_sig_increase = 0.1,
  add_sig_text_size = 5,
  xtext_angle = 45,
  xtext_size = 10,
  axis_text_y = 12,
  heatmap_cell = "P.unadj",
  heatmap_sig = "Significance",
  heatmap_x = "Factors",
  heatmap_y = "Taxa",
  heatmap_lab_fill = "P value",
  ...
)
Arguments
color_values

default RColorBrewer::brewer.pal(8, "Dark2"); colors palette for different groups.

color_group_map

default FALSE; whether match the colors to groups in order to fix the color in each group when part of groups are not shown in the plot. When color_group_map = TRUE, the group_order inside the object will be used as full groups set to guide the color extraction.

use_number

default 1:10; numeric vector; the taxa numbers used in the plot, i.e. 1:n.

threshold

default NULL; threshold value of indicators for selecting taxa, such as 3 for LDA score of LEfSe.

select_group

default NULL; this is used to select the paired group when multiple comparisions are generated; The input select_group must be one of object$res_diff$Comparison.

keep_full_name

default FALSE; whether keep the taxonomic full lineage names.

keep_prefix

default TRUE; whether retain the taxonomic prefix, such as "g__".

group_order

default NULL; a vector to order the legend and colors in plot; If NULL, the function can first determine whether the group column of microtable$sample_table is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group of microtable$sample_table.

group_aggre

default TRUE; whether aggregate the features for each group.

group_two_sep

default TRUE; whether display the features of two groups on opposite sides of the coordinate axes when there are only two groups in total.

coord_flip

default TRUE; whether flip cartesian coordinates so that horizontal becomes vertical, and vertical becomes horizontal.

add_sig

default FALSE; whether add significance label (asterisk) above the bar.

add_sig_increase

default 0.1; the axis position (Value + add_sig_increase * max(Value)) from which to add the significance label; only available when add_sig = TRUE.

add_sig_text_size

default 5; the size of added significance label; only available when add_sig = TRUE.

xtext_angle

default 45; number ranging from 0 to 90; used to make x axis text generate angle to reduce text overlap; only available when coord_flip = FALSE.

xtext_size

default 10; the text size of x axis.

axis_text_y

default 12; the size for the y axis text.

heatmap_cell

default "P.unadj"; the column of data for the cell of heatmap when formula with multiple factors is found in the method.

heatmap_sig

default "Significance"; the column of data for the significance label of heatmap.

heatmap_x

default "Factors"; the column of data for the x axis of heatmap.

heatmap_y

default "Taxa"; the column of data for the y axis of heatmap.

heatmap_lab_fill

default "P value"; legend title of heatmap.

...

parameters passing to geom_bar for the bar plot or plot_cor function in trans_env class for the heatmap of multiple factors when formula is found in the method.

Returns

ggplot.

Examples
\donttest{
t1$plot_diff_bar(use_number = 1:20)
}

Method plot_diff_cladogram()

Plot the cladogram using taxa with significant difference.

Usage
trans_diff$plot_diff_cladogram(
  color = RColorBrewer::brewer.pal(8, "Dark2"),
  group_order = NULL,
  use_taxa_num = 200,
  filter_taxa = NULL,
  use_feature_num = NULL,
  clade_label_level = 4,
  select_show_labels = NULL,
  only_select_show = FALSE,
  sep = "|",
  branch_size = 0.2,
  alpha = 0.2,
  clade_label_size = 2,
  clade_label_size_add = 5,
  clade_label_size_log = exp(1),
  node_size_scale = 1,
  node_size_offset = 1,
  annotation_shape = 22,
  annotation_shape_size = 5
)
Arguments
color

default RColorBrewer::brewer.pal(8, "Dark2"); color palette used in the plot.

group_order

default NULL; a vector to order the legend in plot; If NULL, the function can first check whether the group column of sample_table is factor. If yes, use the levels in it. If provided, this parameter can overwrite the levels in the group of sample_table. If the number of provided group_order is less than the number of groups in res_diff$Group, the function will select the groups of group_order automatically.

use_taxa_num

default 200; integer; The taxa number used in the background tree plot; select the taxa according to the mean abundance .

filter_taxa

default NULL; The mean relative abundance used to filter the taxa with low abundance.

use_feature_num

default NULL; integer; The feature number used in the plot; select the features according to the LDA score (method = "lefse") or MeanDecreaseGini (method = "rf") from high to low.

clade_label_level

default 4; the taxonomic level for marking the label with letters, root is the largest.

select_show_labels

default NULL; character vector; The features to show in the plot with full label names, not the letters.

only_select_show

default FALSE; whether only use the the select features in the parameter select_show_labels.

sep

default "|"; the seperate character in the taxonomic information.

branch_size

default 0.2; numberic, size of branch.

alpha

default 0.2; shading of the color.

clade_label_size

default 2; basic size for the clade label; please also see clade_label_size_add and clade_label_size_log.

clade_label_size_add

default 5; added basic size for the clade label; see the formula in clade_label_size_log parameter.

clade_label_size_log

default exp(1); the base of log function for added size of the clade label; the size formula: clade_label_size + log(clade_label_level + clade_label_size_add, base = clade_label_size_log); so use clade_label_size_log, clade_label_size_add and clade_label_size can totally control the label size for different taxonomic levels.

node_size_scale

default 1; scale for the node size.

node_size_offset

default 1; offset for the node size.

annotation_shape

default 22; shape used in the annotation legend.

annotation_shape_size

default 5; size used in the annotation legend.

Returns

ggplot.

Examples
\donttest{
t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)
}

Method print()

Print the trans_alpha object.

Usage
trans_diff$print()

Method clone()

The objects of this class are cloneable with this method.

Usage
trans_diff$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `trans_diff$new`
## ------------------------------------------------


data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "rf", group = "Group")
t1 <- trans_diff$new(dataset = dataset, method = "metastat", group = "Group", taxa_level = "Genus")
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")


## ------------------------------------------------
## Method `trans_diff$plot_diff_abund`
## ------------------------------------------------


t1 <- trans_diff$new(dataset = dataset, method = "anova", group = "Group", taxa_level = "Genus")
t1$plot_diff_abund(use_number = 1:10)
t1$plot_diff_abund(use_number = 1:10, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "wilcox", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)
t1 <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group")
t1$plot_diff_abund(use_number = 1:20)
t1$plot_diff_abund(use_number = 1:20, add_sig = TRUE)


## ------------------------------------------------
## Method `trans_diff$plot_diff_bar`
## ------------------------------------------------


t1$plot_diff_bar(use_number = 1:20)


## ------------------------------------------------
## Method `trans_diff$plot_diff_cladogram`
## ------------------------------------------------


t1$plot_diff_cladogram(use_taxa_num = 100, use_feature_num = 30, select_show_labels = NULL)


[Package microeco version 1.8.0 Index]