GLMMMiRKAT {MiRKAT} | R Documentation |
The Microbiome Regression-based Kernel Association Test Based on the Generalized Linear Mixed Model
Description
GLMMMiRKAT utilizes a generalized linear mixed model to allow dependence among samples.
Usage
GLMMMiRKAT(
y,
X = NULL,
Ks,
id = NULL,
time.pt = NULL,
model,
method = "perm",
slope = FALSE,
omnibus = "perm",
nperm = 5000
)
Arguments
y |
A numeric vector of Gaussian (e.g., body mass index), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. |
X |
A vector or matrix of numeric covariates, if applicable (default = NULL). |
Ks |
A list of n-by-n OTU kernel matrices or one singular n-by-n OTU kernel matrix, where n is sample size. |
id |
A vector of cluster (e.g., family or subject including repeated measurements) IDs. Defaults to NULL since it is unnecessary for the CSKAT call. |
time.pt |
A vector of time points for the longitudinal studies. 'time.pt' is not required (i.e., 'time.pt = NULL') for the random intercept model. Default is time.pt = NULL. |
model |
A string declaring which model ("gaussian", "binomial" or "poisson") is to be used; should align with whether a Gaussian, Binomial, or Poisson trait is being inputted for the y argument. |
method |
A string declaring which method ("perm" or "davies) will be used to calculate the p-value. Davies is only available for Gaussian traits. Defaults to "perm". |
slope |
An indicator to include random slopes in the model (slope = TRUE) or not (slope = FALSE). 'slope = FALSE' is for the random intercept model. 'slope = TRUE' is for the random slope model. For the random slope model (slope = TRUE), 'time.pt' is required. |
omnibus |
A string equal to either "Cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value. |
nperm |
The number of permutations used to calculate the p-values and omnibus p-value. Defaults to 5000. |
Details
Missing data is not permitted. Please remove all individuals with missing y, X, and Ks prior to input for analysis.
y and X (if not NULL) should be numerical matrices or vectors with the same number of rows.
Ks should either be a list of n by n kernel matrices (where n is sample size) or a single kernel matrix. If you have distance matrices from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.
If model="gaussian" and method="davies", CSKAT is called. CSKAT utilizes the same omnibus test as GLMMMiRKAT. See ?CSKAT for more details.
The "method" argument only determines kernel-specific p-values are generated. When Ks is a list of multiple kernels, an omnibus p-value is computed via permutation.
Value
Returns a p-value for each inputted kernel matrix, as well as an overall omnibus p-value if more than one kernel matrix is inputted
p_values |
p-value for each individual kernel matrix |
omnibus_p |
overall omnibus p-value calculated by permutation for the adaptive GLMMMiRKAT analysis |
Author(s)
Hyunwook Koh
References
Koh H, Li Y, Zhan X, Chen J, Zhao N. (2019) A distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies. Front. Genet. 458(10), 1-14.
Examples
## Example with Gaussian (e.g., body mass index) traits
## For non-Gaussian traits, see vignette.
# Import example microbiome data with Gaussian traits
data(nordata)
otu.tab <- nordata$nor.otu.tab
meta <- nordata$nor.meta
# Create kernel matrices and run analysis
if (requireNamespace("vegan")) {
library(vegan)
D_BC = as.matrix(vegdist(otu.tab, 'bray'))
K_BC = D2K(D_BC)
GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id,
Ks = K_BC, model = "gaussian", nperm = 500)
} else {
# Computation time is longer for phylogenetic kernels
tree <- nordata$nor.tree
unifracs <- GUniFrac::GUniFrac(otu.tab, tree, alpha=c(1))$unifracs
D_W <- unifracs[,,"d_1"]
K_W = D2K(D_W)
GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id,
Ks = K_W, model = "gaussian", nperm = 500)
}