test.vc.twosigma {twosigma} | R Documentation |
Convenient wrapper function for performing (joint) likelihood ratio tests of variance components using the TWO-SIGMA model.
Description
Convenient wrapper function for performing (joint) likelihood ratio tests of variance components using the TWO-SIGMA model.
Usage
test.vc.twosigma(
count_matrix,
mean_covar,
zi_covar,
mean_re = TRUE,
zi_re = TRUE,
id,
return_full_fits = TRUE,
adhoc = FALSE,
adhoc_thresh = 0.1,
silent = FALSE,
disp_covar = NULL,
weights = rep(1, ncol(count_matrix)),
control = glmmTMBControl(),
ncores = 1,
cluster_type = "Fork",
chunk_size = 1,
lb = FALSE
)
Arguments
count_matrix |
Matrix of non-negative integer read counts, with rows corresponding to genes and columns corresponding to cells. It is recommended to make the rownames the gene names for better output. |
mean_covar |
Covariates for the (conditional) mean model. Must be a matrix (without an intercept column) or a vector if a single covariate is being tested. |
zi_covar |
Covariates for the zero-inflation model. Must be a matrix (without an intercept column) or a vector if a single covariate is being tested. |
mean_re |
Should random intercepts be tested in the (conditional) mean model? |
zi_re |
Should random intercepts be tested in the zero-inflation model? |
id |
Vector of individual-level ID's. Used for random effect prediction and the adhoc method but required regardless. |
return_full_fits |
If TRUE, fit objects of class glmmTMB are returned. If FALSE, only objects of class summary.glmmTMB are returned. The latter require a much larger amount of memory to store. |
adhoc |
Should the adhoc method be used by default to judge if random effects are needed? |
adhoc_thresh |
Value below which the adhoc p-value is deemed significant (and thus RE are deemed necessary). Only used if adhoc==TRUE. |
silent |
If TRUE, progress is not printed. |
disp_covar |
Covariates for a log-linear model for the dispersion. Either a matrix or = 1 to indicate an intercept only model. |
weights |
weights, as in glm. Defaults to 1 for all observations and no scaling or centering of weights is performed. See |
control |
Control parameters for optimization in glmmTMB. |
ncores |
Number of cores used for parallelization. Defaults to 1, meaning no parallelization of any kind is done. |
cluster_type |
Whether to use a "cluster of type "Fork" or "Sock". On Unix systems, "Fork" will likely improve performance. On Windows, only "Sock" will actually result in parallelized computing. |
chunk_size |
Number of genes to be sent to each parallel environment. Parallelization is more efficient, particularly with a large count matrix, when the count matrix is 'chunked' into some common size (e.g. 10, 50, 200). Defaults to 10. |
lb |
Should load balancing be used for parallelization? Users will likely want to set to FALSE for improved performance. |
Value
A list with the following elements:
fit_null:
Model fits under the null hypothesis. Ifreturn_summary_fits=TRUE
, returns a list of objects of classsummary.glmmTMB
. Ifreturn_summary_fits=FALSE
, returns a list of model fit objects of classglmmTMB
. In either case, the order matches the row order ofcount_matrix
, and the names of the list elements are taken as the rownames ofcount_matrix
.fit_alt:
Model fits under the alt hypothesis of the same format asfit_null
.LR_stat:
Vector of Likelihood Ratio statistics. A value of 'NA' implies a convergence issue or other model fit problem.LR_p.val:
Vector of Likelihood Ratio p-values. A value of 'NA' implies a convergence issue or other model fit problem.
Details
If either model fails to converge, or the LR statistic is negative, both the statistic and p-value are assigned as NA.
Examples
# Set Parameters to Simulate Some Data
nind<-10;ncellsper<-rep(50,nind)
sigma.a<-.5;sigma.b<-.5;phi<-.1
alpha<-c(1,0,-.5,-2);beta<-c(2,0,-.1,.6)
beta2<-c(2,1,-.1,.6)
id.levels<-1:nind;nind<-length(id.levels)
id<-rep(id.levels,times=ncellsper)
sim.seed<-1234
# Simulate individual level covariates
t2d_sim<-rep(rbinom(nind,1,p=.4),times=ncellsper)
cdr_sim<-rbeta(sum(ncellsper),3,6)
age_sim<-rep(sample(c(20:60),size=nind,replace = TRUE),times=ncellsper)
# Construct design matrices
Z<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(Z)<-c("t2d_sim","age_sim","cdr_sim")
X<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(X)<-c("t2d_sim","age_sim","cdr_sim")
# Simulate Data
sim_dat<-matrix(nrow=2,ncol=sum(ncellsper))
for(i in 1:nrow(sim_dat)){
sim_dat[i,]<-simulate_zero_inflated_nb_random_effect_data(ncellsper,X,Z,alpha,beta2
,phi,sigma.a,sigma.b,id.levels=NULL)$Y
}
rownames(sim_dat)<-paste("Gene",1:2)
# Run test.vc.twosigma
test.vc.twosigma(sim_dat[1,,drop=FALSE],mean_covar = X,zi_covar=Z
,mean_re = TRUE,zi_re=FALSE,id = id)