twosigma_custom {twosigma} | R Documentation |
Fit the TWO-SIGMA model with custom user-specified model formulas.
Description
Fit the TWO-SIGMA model with custom user-specified model formulas.
Usage
twosigma_custom(
count_matrix,
mean_form,
zi_form,
id,
return_summary_fits = TRUE,
silent = FALSE,
disp_covar = NULL,
weights = rep(1, ncol(count_matrix)),
control = glmmTMBControl(),
ncores = 1,
cluster_type = "Fork",
chunk_size = 10,
lb = FALSE,
internal_call = FALSE
)
Arguments
count_matrix |
Matrix of non-negative integer read counts, with rows corresponding to genes and columns corresponding to cells. It is recommended to make the rownames the gene names for better output. |
mean_form |
Custom two-sided model formula for the (conditional) mean model. Formula is passed directly into glmmTMB with random effects specified as in the lme4 package. Users should ensure that the LHS of the formula begins with "count." |
zi_form |
Custom one-sided model formula for the zero-inflation model. Formula is passed directly into glmmTMB with random effects specified as in lme4. |
id |
Vector of individual-level (sample-level) ID's. Used for random effect prediction but required regardless of their presence in the model. |
return_summary_fits |
If TRUE, the package returns a |
silent |
If TRUE, progress is not printed. |
disp_covar |
Covariates for a log-linear model for the dispersion. Either a matrix of covariates or = 1 to indicate an intercept only model. Random effect terms are not permitted in the dispersion model. |
weights |
weights, as in glm. Defaults to 1 for all observations and no scaling or centering of weights is performed. |
control |
Control parameters for optimization in |
ncores |
Number of cores used for parallelization. Defaults to 1, meaning no parallelization of any kind is done. |
cluster_type |
Whether to use a "cluster of type "Fork" or "Sock". On Unix systems, "Fork" will likely improve performance. On Windows, only "Sock" will actually result in parallelized computing. |
chunk_size |
Number of genes to be sent to each parallel environment. Parallelization is more efficient, particularly with a large count matrix, when the count matrix is 'chunked' into some common size (e.g. 10, 50, 200). Defaults to 10. |
lb |
Should load balancing be used for parallelization? Users will likely want to set to FALSE for improved performance. |
internal_call |
Not needed by users called |
Value
A list with the following elements:
fit:
Ifreturn_summary_fits=TRUE
, returns a list of model fit objects of classsummary.glmmTMB
. Ifreturn_summary_fits=FALSE
, returns a list of model fit objects of classglmmTMB
. In either case, the order matches the row order ofcount_matrix
, and the names of the list elements are taken as the rownames ofcount_matrix
.gene_error:
Vector indicating whether the particular gene produced an error during model fitting (TRUE) or not (FALSE).
Details
This function is likely only needed if users wish to include random effect terms beyond random intercepts. Users should be confident in their abilities to specify random effects using the syntax of lme4.
Examples
# Set Parameters to Simulate Some Data
nind<-10;ncellsper<-rep(50,nind)
sigma.a<-.5;sigma.b<-.5;phi<-.1
alpha<-c(1,0,-.5,-2);beta<-c(2,0,-.1,.6)
beta2<-c(2,1,-.1,.6)
id.levels<-1:nind;nind<-length(id.levels)
id<-rep(id.levels,times=ncellsper)
sim.seed<-1234
# Simulate individual level covariates
t2d_sim<-rep(rbinom(nind,1,p=.4),times=ncellsper)
cdr_sim<-rbeta(sum(ncellsper),3,6)
age_sim<-rep(sample(c(20:60),size=nind,replace = TRUE),times=ncellsper)
# Construct design matrices
Z<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(Z)<-c("t2d_sim","age_sim","cdr_sim")
X<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(X)<-c("t2d_sim","age_sim","cdr_sim")
# Simulate Data
sim_dat<-matrix(nrow=2,ncol=sum(ncellsper))
for(i in 1:nrow(sim_dat)){
sim_dat[i,]<-simulate_zero_inflated_nb_random_effect_data(ncellsper,X,Z,alpha,beta2
,phi,sigma.a,sigma.b,id.levels=NULL)$Y
}
rownames(sim_dat)<-paste("Gene",1:2)
# Run twosigma_custom
twosigma_custom(sim_dat[1:2,],mean_form = count~X,zi_form = ~0,id=id)