lnmbiclust {lnmCluster} | R Documentation |
Logistic Normal Multinomial Biclustering algorithm
Description
Main function that can do LNM biclustering and select the best model based on BIC, AIC or ICL.
Usage
lnmbiclust(W_count, range_G, range_Q, model, criteria, iter, permutation, X)
Arguments
W_count |
The microbiome count matrix |
range_G |
All possible number of components. A vector. |
range_Q |
All possible number of bicluster for each component. A vector |
model |
The covaraince structure you choose, there are 16 different models belongs to this family:UUU, UUG, UUD, UUC, UGU, UGG, UGD, UGC, GUU, GUG, GUD, GUC, GGU, GGG, GGD, GGC. You can choose more than 1 covarance structure to do model selection. |
criteria |
one of AIC, BIC or ICL. The best model is depends on the criteria you choose. The default is BIC |
iter |
Max iterations, defaul is 150. |
permutation |
Only has effect when model contains UUU, UUG, UUD or UUC. If TRUE, it assume the number of biclusters could be different for different components. If FALSE, it assume the number of biclusters are the same cross all components. Default is FALSE. |
X |
The regression covariate matrix, which is generated by model.matrix. |
Value
z_ig Estimated latent variable z
cluster Component labels
mu_g Estimated component mean
pi_g Estimated component proportion
B_g Estimated bicluster membership
T_g Estimated covariance of latent variable u
D_g Estimated error covariance
COV Estimated sparsity component covariance
beta_g Estimated covariate coefficients
sigma Estimated original component covariance
overall_loglik Complete log likelihood value for each iteration
ICL ICL value
BIC BIC value
AIC AIC value
all_fitted_model display all names of fitted models in a data.frame.
Examples
#generate toy data with n=100, K=5,
#set up parameters
n<-100
p<-5
mu1<-c(-2.8,-1.3,-1.6,-3.9,-2.6)
B1<-matrix(c(1,0,1,0,1,0,0,1,0,1),nrow = p, byrow=TRUE)
T1<-diag(c(2.9,0.5))
D1<-diag(c(0.52, 1.53, 0.56, 0.19, 1.32))
cov1<-B1%*%T1%*%t(B1)+D1
mu2<-c(1.5,-2.7,-1.1,-0.4,-1.4)
B2<-matrix(c(1,0,1,0,0,1,0,1,0,1),nrow = p, byrow=TRUE)
T2<-diag(c(0.2,0.003))
D2<-diag(c(0.01, 0.62, 0.45, 0.01, 0.37))
cov2<-B2%*%T2%*%t(B2)+D2
#generate normal distribution
library(mvtnorm)
simp<-rmultinom(n,1,c(0.6,0.4))
lab<-as.factor(apply(t(simp),1,which.max))
df<-matrix(0,nrow=n,ncol=p)
for (i in 1:n) {
if(lab[i]==1){df[i,]<-rmvnorm(1,mu1,sigma = cov1)}
else if(lab[i]==2){df[i,]<-rmvnorm(1,mu2,sigma = cov2)}
}
#apply inverse of additive log ratio and transform normal to count data
f_df<-cbind(df,0)
z<-exp(f_df)/rowSums(exp(f_df))
W_count<-matrix(0,nrow=n,ncol=p+1)
for (i in 1:n) {
W_count[i,]<-rmultinom(1,runif(1,10000,20000),z[i,])
}
#'#if run one model let range_Q be an integer
res<-lnmbiclust(W_count,2,2,model="UUU")
#following will run 2 combinations of Q: 2 2, and 3 3 with G=2.
res<-lnmbiclust(W_count,2,range_Q=c(2:3),model="UUU")
#if run model selection let range_Q and range_G be a vector.
#model selection for all 16 models with G=1 to 3, Q=1 to 3.
res<-lnmbiclust(W_count,c(1:3),c(1:3))