linda {MicrobiomeStat} | R Documentation |
Linear (Lin) Model for Differential Abundance (DA) Analysis of High-dimensional Compositional Data
The function implements a simple, robust and highly scalable approach to tackle the compositional effects in differential abundance analysis of high-dimensional compositional data. It fits linear regression models on the centered log2-ratio transformed data, identifies a bias term due to the transformation and compositional effect, and corrects the bias using the mode of the regression coefficients. It could fit mixed-effect models for analysis of correlated data.
feature.dat.type = c('count', 'proportion'),
prev.filter = 0,
mean.abund.filter = 0,
max.abund.filter = 0,
is.winsor = TRUE,
outlier.pct = 0.03,
adaptive = TRUE,
zero.handling = c('pseudo-count', 'imputation'),
pseudo.cnt = 0.5,
corr.cut = 0.1,
p.adj.method = "BH",
alpha = 0.05,
n.cores = 1,
verbose = TRUE
feature.dat |
a matrix of counts/proportions, row - features (OTUs, genes, etc) , column - samples. |
meta.dat |
a data frame containing the sample meta data. If there are NAs, the corresponding samples will be removed in the analysis. |
formula |
a character string for the formula. The formula should conform to that used by |
feature.dat.type |
the type of the feature data. It could be "count" or "proportion". |
prev.filter |
the prevalence (percentage of non-zeros) cutoff, under which the features will be filtered. The default is 0. |
mean.abund.filter |
the mean relative abundance cutoff, under which the features will be filtered. The default is 0. |
max.abund.filter |
the max relative abundance cutoff, under which the features will be filtered. The default is 0. |
is.winsor |
a logical value indicating whether winsorization should be performed to replace outliers (high values). The default is TRUE. |
outlier.pct |
the expected percentage of outliers. These outliers will be winsorized. The default is 0.03. |
adaptive |
a logical value indicating whether the approach to handle zeros (pseudo-count or imputation)
will be determined based on the correlations between the log(sequencing depth) and the explanatory variables
in |
zero.handling |
a character string of 'pseudo-count' or 'imputation' indicating the zero handling method
used when |
pseudo.cnt |
a positive numeric value for the pseudo-count to be added if |
corr.cut |
a numerical value between 0 and 1, indicating the significance level used for determining
the zero-handling approach when |
p.adj.method |
a character string indicating the p-value adjustment approach for
addressing multiple testing. See R function |
alpha |
a numerical value between 0 and 1 indicating the significance level for declaring differential features. Default is 0.05. |
n.cores |
a positive integer. If |
verbose |
a logical value indicating whether the trace information should be printed out. |
A list with the elements
variables |
A vector of variable names of all fixed effects in |
bias |
numeric vector; each element corresponds to one variable in |
output |
a list of data frames with columns 'baseMean', 'log2FoldChange', 'lfcSE', 'stat', 'pvalue', 'padj', 'reject',
covariance |
a list of data frames; the data frame records the covariances between a regression coefficient with other coefficients;
| |
the OTU table used in the abundance analysis (the |
meta.use |
the meta data used in the abundance analysis (only variables in |
wald |
a list for use in Wald test. If the fitting model is a linear model, then it includes
If the fitting model is a linear mixed-effect model, then it includes
Huijuan Zhou, Jun Chen, Xianyang Zhang
Zhou, H., He, K., Chen, J., Zhang, X. (2022). LinDA: linear models for differential abundance analysis of microbiome compositional data. Genome biology, 23(1), 95.
ind <- smokers$meta$AIRWAYSITE == 'Throat' <-$otu[, ind])
depth <- colSums(
meta <- = factor(smokers$meta$SMOKER[ind]),
Sex = factor(smokers$meta$SEX[ind]),
Site = factor(smokers$meta$SIDEOFBODY[ind]),
SubjectID = factor(smokers$meta$HOST_SUBJECT_ID[ind]))
# Differential abundance analysis using the left throat data
ind1 <- meta$Site == 'Left' & depth >= 1000
linda.obj <- linda([, ind1], meta[ind1, ], formula = '~Smoke+Sex',
feature.dat.type = 'count',
prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
p.adj.method = "BH", alpha = 0.1)
linda.plot(linda.obj, c('Smokey', 'Sexmale'),
titles = c('Smoke: n v.s. y', 'Sex: female v.s. male'),
alpha = 0.1, lfc.cut = 1, legend = TRUE, directory = NULL,
width = 11, height = 8)
rownames(linda.obj $output[[1]])[which(linda.obj $output[[1]]$reject)]
# Differential abundance analysis pooling both the left and right throat data
# Mixed effects model is used
ind <- depth >= 1000
linda.obj <- linda([, ind], meta[ind, ], formula = '~Smoke+Sex+(1|SubjectID)',
feature.dat.type = 'count',
prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
p.adj.method = "BH", alpha = 0.1)
# For proportion data <- t(t( / colSums(
ind1 <- meta$Site == 'Left' & depth >= 1000
lind.obj <- linda([, ind1], meta[ind1, ], formula = '~Smoke+Sex',
feature.dat.type = 'proportion',
prev.filter = 0.1, is.winsor = TRUE, outlier.pct = 0.03,
p.adj.method = "BH", alpha = 0.1)