twosigma {twosigma} | R Documentation |
Fit the TWO-SIGMA Model.
Description
Fit the TWO-SIGMA Model.
Usage
twosigma(
count_matrix,
mean_covar,
zi_covar,
mean_re = TRUE,
zi_re = TRUE,
id,
adhoc = TRUE,
adhoc_thresh = 0.1,
return_summary_fits = TRUE,
disp_covar = NULL,
weights = rep(1, ncol(count_matrix)),
control = glmmTMBControl(),
ncores = 1,
cluster_type = "Fork",
chunk_size = 10,
lb = FALSE
)
Arguments
count_matrix |
Matrix of non-negative integer read counts, with rows corresponding to genes and columns corresponding to cells. It is recommended to make the rownames the gene names for better output. |
mean_covar |
Covariates for the (conditional) mean model. Must be a matrix (without an intercept column) or = 1 to indicate an intercept only model. |
zi_covar |
Covariates for the zero-inflation model. Must be a matrix (without an intercept column), = 1 to indicate an intercept only model, or = 0 to indicate no zero-inflation model desired. |
mean_re |
Should random intercepts be included in the (conditional) mean model? Ignored if adhoc=TRUE. |
zi_re |
Should random intercepts be included in the zero-inflation model? Ignored if adhoc=TRUE. |
id |
Vector of individual-level ID's. Used for random effect prediction and the adhoc method but required regardless. |
adhoc |
Should the adhoc method be used by default to judge if random effects are needed? |
adhoc_thresh |
Value below which the adhoc p-value is deemed significant (and thus RE are deemed necessary). Only used if adhoc==TRUE. |
return_summary_fits |
If TRUE, the package returns a |
disp_covar |
Covariates for a log-linear model for the dispersion. Either a matrix of covariates or = 1 to indicate an intercept only model. Random effect terms are not permitted in the dispersion model. Defaults to NULL for constant dispersion. |
weights |
weights, as in glm. Defaults to 1 for all observations and no scaling or centering of weights is performed. |
control |
Control parameters for optimization in |
ncores |
Number of cores used for parallelization. Defaults to 1, meaning no parallelization of any kind is done. |
cluster_type |
Whether to use a "cluster of type "Fork" or "Sock". On Unix systems, "Fork" will likely improve performance. On Windows, only "Sock" will actually result in parallelized computing. |
chunk_size |
Number of genes to be sent to each parallel environment. Parallelization is more efficient, particularly with a large count matrix, when the count matrix is 'chunked' into some common size (e.g. 10, 50, 200). Defaults to 10. |
lb |
Should load balancing be used for parallelization? Users will likely want to set to FALSE for improved performance. |
Value
A list with the following elements: ##'
fit:
Ifreturn_summary_fits=TRUE
, returns a list of model fit objects of classsummary.glmmTMB
. Ifreturn_summary_fits=FALSE
, returns a list of model fit objects of classglmmTMB
. In either case, the order matches the row order ofcount_matrix
, and the names of the list elements are taken as the rownames ofcount_matrix
.adhoc_include_RE:
Logical vector indicator whether the adhoc method determined random effects needed. Ifadhoc=F
, then a vector of NA's.gene_error:
Vector indicating whether the particular gene produced an error during model fitting (TRUE) or not (FALSE).
Details
If adhoc=TRUE, any input in mean_re and zi_re will be ignored.
Examples
# Set Parameters to Simulate Some Data
nind<-10;ncellsper<-rep(50,nind)
sigma.a<-.5;sigma.b<-.5;phi<-.1
alpha<-c(1,0,-.5,-2);beta<-c(2,0,-.1,.6)
beta2<-c(2,1,-.1,.6)
id.levels<-1:nind;nind<-length(id.levels)
id<-rep(id.levels,times=ncellsper)
sim.seed<-1234
# Simulate individual level covariates
t2d_sim<-rep(rbinom(nind,1,p=.4),times=ncellsper)
cdr_sim<-rbeta(sum(ncellsper),3,6)
age_sim<-rep(sample(c(20:60),size=nind,replace = TRUE),times=ncellsper)
# Construct design matrices
Z<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(Z)<-c("t2d_sim","age_sim","cdr_sim")
X<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(X)<-c("t2d_sim","age_sim","cdr_sim")
# Simulate Data
sim_dat<-matrix(nrow=2,ncol=sum(ncellsper))
for(i in 1:nrow(sim_dat)){
sim_dat[i,]<-simulate_zero_inflated_nb_random_effect_data(ncellsper,X,Z,alpha,beta2
,phi,sigma.a,sigma.b,id.levels=NULL)$Y
}
rownames(sim_dat)<-paste("Gene",1:2)
# Run twosigma
twosigma(sim_dat[1:2,],mean_covar = X,zi_covar=1,id = id)