R: Fit the TWO-SIGMA Model.

twosigma {twosigma}

R Documentation

Fit the TWO-SIGMA Model.

Description

Fit the TWO-SIGMA Model.

Usage

twosigma(
  count_matrix,
  mean_covar,
  zi_covar,
  mean_re = TRUE,
  zi_re = TRUE,
  id,
  adhoc = TRUE,
  adhoc_thresh = 0.1,
  return_summary_fits = TRUE,
  disp_covar = NULL,
  weights = rep(1, ncol(count_matrix)),
  control = glmmTMBControl(),
  ncores = 1,
  cluster_type = "Fork",
  chunk_size = 10,
  lb = FALSE
)

Arguments

`count_matrix`	Matrix of non-negative integer read counts, with rows corresponding to genes and columns corresponding to cells. It is recommended to make the rownames the gene names for better output.
`mean_covar`	Covariates for the (conditional) mean model. Must be a matrix (without an intercept column) or = 1 to indicate an intercept only model.
`zi_covar`	Covariates for the zero-inflation model. Must be a matrix (without an intercept column), = 1 to indicate an intercept only model, or = 0 to indicate no zero-inflation model desired.
`mean_re`	Should random intercepts be included in the (conditional) mean model? Ignored if adhoc=TRUE.
`zi_re`	Should random intercepts be included in the zero-inflation model? Ignored if adhoc=TRUE.
`id`	Vector of individual-level ID's. Used for random effect prediction and the adhoc method but required regardless.
`adhoc`	Should the adhoc method be used by default to judge if random effects are needed?
`adhoc_thresh`	Value below which the adhoc p-value is deemed significant (and thus RE are deemed necessary). Only used if adhoc==TRUE.
`return_summary_fits`	If TRUE, the package returns a `summary.glmmTMB` object for each gene. If FALSE, an object of class `glmmTMB` is returned for each gene. The latter requires far more memory to store.
`disp_covar`	Covariates for a log-linear model for the dispersion. Either a matrix of covariates or = 1 to indicate an intercept only model. Random effect terms are not permitted in the dispersion model. Defaults to NULL for constant dispersion.
`weights`	weights, as in glm. Defaults to 1 for all observations and no scaling or centering of weights is performed.
`control`	Control parameters for optimization in `glmmTMB`. See `?glmmTMBControl`.
`ncores`	Number of cores used for parallelization. Defaults to 1, meaning no parallelization of any kind is done.
`cluster_type`	Whether to use a "cluster of type "Fork" or "Sock". On Unix systems, "Fork" will likely improve performance. On Windows, only "Sock" will actually result in parallelized computing.
`chunk_size`	Number of genes to be sent to each parallel environment. Parallelization is more efficient, particularly with a large count matrix, when the count matrix is 'chunked' into some common size (e.g. 10, 50, 200). Defaults to 10.
`lb`	Should load balancing be used for parallelization? Users will likely want to set to FALSE for improved performance.

Value

A list with the following elements: ##'

fit: If return_summary_fits=TRUE, returns a list of model fit objects of class summary.glmmTMB. If return_summary_fits=FALSE, returns a list of model fit objects of class glmmTMB. In either case, the order matches the row order of count_matrix, and the names of the list elements are taken as the rownames of count_matrix.
adhoc_include_RE: Logical vector indicator whether the adhoc method determined random effects needed. If adhoc=F, then a vector of NA's.
gene_error: Vector indicating whether the particular gene produced an error during model fitting (TRUE) or not (FALSE).

Details

If adhoc=TRUE, any input in mean_re and zi_re will be ignored.

Examples

# Set Parameters to Simulate Some Data

nind<-10;ncellsper<-rep(50,nind)
sigma.a<-.5;sigma.b<-.5;phi<-.1
alpha<-c(1,0,-.5,-2);beta<-c(2,0,-.1,.6)
beta2<-c(2,1,-.1,.6)
id.levels<-1:nind;nind<-length(id.levels)
id<-rep(id.levels,times=ncellsper)
sim.seed<-1234

# Simulate individual level covariates

t2d_sim<-rep(rbinom(nind,1,p=.4),times=ncellsper)
cdr_sim<-rbeta(sum(ncellsper),3,6)
age_sim<-rep(sample(c(20:60),size=nind,replace = TRUE),times=ncellsper)

# Construct design matrices

Z<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(Z)<-c("t2d_sim","age_sim","cdr_sim")
X<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(X)<-c("t2d_sim","age_sim","cdr_sim")

# Simulate Data

sim_dat<-matrix(nrow=2,ncol=sum(ncellsper))
for(i in 1:nrow(sim_dat)){
   sim_dat[i,]<-simulate_zero_inflated_nb_random_effect_data(ncellsper,X,Z,alpha,beta2
   ,phi,sigma.a,sigma.b,id.levels=NULL)$Y
}
rownames(sim_dat)<-paste("Gene",1:2)

# Run twosigma

twosigma(sim_dat[1:2,],mean_covar = X,zi_covar=1,id = id)

[Package twosigma version 1.0.2 Index]