R: Association analysis of a multi-subject single-cell data set...

nebula {nebula}

R Documentation

Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model

Description

Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model

Usage

nebula(
  count,
  id,
  pred = NULL,
  offset = NULL,
  min = c(1e-04, 1e-04),
  max = c(10, 1000),
  model = "NBGMM",
  method = "LN",
  cutoff_cell = 20,
  kappa = 800,
  opt = "lbfgs",
  verbose = TRUE,
  cpc = 0.005,
  mincp = 5,
  covariance = FALSE,
  output_re = FALSE,
  reml = 0,
  ncore = 2,
  fmaxsize = Inf
)

Arguments

`count`	A raw count matrix of the single-cell data. The rows are the genes, and the columns are the cells. The matrix can be a matrix object or a sparse dgCMatrix object.
`id`	A vector of subject IDs. The length should be the same as the number of columns of the count matrix.
`pred`	A design matrix of the predictors. The rows are the cells and the columns are the predictors. If not specified, an intercept column will be generated by default.
`offset`	A vector of the scaling factor. The values must be strictly positive. If not specified, a vector of all ones will be generated by default.
`min`	Minimum values for the overdispersions parameters `\sigma^2` and `\phi`. Must be positive. The default is c(1e-4,1e-4).
`max`	Maximum values for the overdispersions parameters `\sigma^2` and `\phi`. Must be positive. The default is c(10,1000).
`model`	'NBGMM', 'PMM' or 'NBLMM'. 'NBGMM' is for fitting a negative binomial gamma mixed model. 'PMM' is for fitting a Poisson gamma mixed model. 'NGLMM' is for fitting a negative binomial lognormal mixed model (the same model as that in the lme4 package). The default is 'NBGMM'.
`method`	'LN' or 'HL'. 'LN' is to use NEBULA-LN and 'HL' is to use NEBULA-HL. The default is 'LN'.
`cutoff_cell`	The data will be refit using NEBULA-HL to estimate both overdispersions if the product of the cells per subject and the estimated cell-level overdispersion parameter `\phi` is smaller than cutoff_cell. The default is 20.
`kappa`	Please see the vignettes for more details. The default is 800.
`opt`	'lbfgs' or 'trust'. Specifying the optimization algorithm used in NEBULA-LN. The default is 'lbfgs'. If it is 'trust', a trust region algorithm based on the Hessian matrix will be used for optimization.
`verbose`	An optional logical scalar indicating whether to print additional messages. Default is FALSE.
`cpc`	A non-negative threshold for filtering low-expression genes. Genes with counts per cell smaller than the specified value will not be analyzed.
`mincp`	A positive integer threshold for filtering low-expression genes. A gene will not be analyzed if its number of cells that have a non-zero count is smaller than the specified value .
`covariance`	If TRUE, nebula will output the covariance matrix for the estimated log(FC), which can be used for testing contrasts.
`output_re`	If TRUE, nebula will output the subject-level random effects. Only effective for model='NBGMM' or 'NBLMM'.
`reml`	Either 0 (default) or 1. If it is one, REML will be used to estimate the overdispersions.
`ncore`	The number of cores used for parallel computing.
`fmaxsize`	The maximum allowed total size (in bytes) of global variables (future.globals.maxSize) when using parallel computing.

Value

summary: The estimated coefficient, standard error and p-value for each predictor.

overdispersion: The estimated cell-level and subject-level overdispersions \sigma^2 and \phi^{-1}.

convergence: More information about the convergence of the algorithm for each gene. A value of -20 or lower indicates a potential failure of the convergence. A value of one indicates that the convergence is reached due to a sufficiently small improvement of the function value. A value of -10 indicates that the convergence is reached because the gradients are close to zero (i.e., the critical point) and no improvement of the function value can be found.

algorithm: The algorithm used for analyzing the gene. More information can be found in the vignettes.

covariance: The covariance matrix for the estimated log(FC).

random_effect: The subject-level random effects.

Examples

library(nebula)
data(sample_data)
pred = model.matrix(~X1+X2+cc,data=sample_data$pred)
re = nebula(count=sample_data$count,id=sample_data$sid,pred=pred)

[Package nebula version 1.5.3 Index]