nebula {nebula} | R Documentation |
Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model
Description
Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model
Usage
nebula(
count,
id,
pred = NULL,
offset = NULL,
min = c(1e-04, 1e-04),
max = c(10, 1000),
model = "NBGMM",
method = "LN",
cutoff_cell = 20,
kappa = 800,
opt = "lbfgs",
verbose = TRUE,
cpc = 0.005,
mincp = 5,
covariance = FALSE,
output_re = FALSE,
reml = 0,
ncore = 2,
fmaxsize = Inf
)
Arguments
count |
A raw count matrix of the single-cell data. The rows are the genes, and the columns are the cells. The matrix can be a matrix object or a sparse dgCMatrix object. |
id |
A vector of subject IDs. The length should be the same as the number of columns of the count matrix. |
pred |
A design matrix of the predictors. The rows are the cells and the columns are the predictors. If not specified, an intercept column will be generated by default. |
offset |
A vector of the scaling factor. The values must be strictly positive. If not specified, a vector of all ones will be generated by default. |
min |
Minimum values for the overdispersions parameters |
max |
Maximum values for the overdispersions parameters |
model |
'NBGMM', 'PMM' or 'NBLMM'. 'NBGMM' is for fitting a negative binomial gamma mixed model. 'PMM' is for fitting a Poisson gamma mixed model. 'NGLMM' is for fitting a negative binomial lognormal mixed model (the same model as that in the lme4 package). The default is 'NBGMM'. |
method |
'LN' or 'HL'. 'LN' is to use NEBULA-LN and 'HL' is to use NEBULA-HL. The default is 'LN'. |
cutoff_cell |
The data will be refit using NEBULA-HL to estimate both overdispersions if the product of the cells per subject and the estimated cell-level overdispersion parameter |
kappa |
Please see the vignettes for more details. The default is 800. |
opt |
'lbfgs' or 'trust'. Specifying the optimization algorithm used in NEBULA-LN. The default is 'lbfgs'. If it is 'trust', a trust region algorithm based on the Hessian matrix will be used for optimization. |
verbose |
An optional logical scalar indicating whether to print additional messages. Default is FALSE. |
cpc |
A non-negative threshold for filtering low-expression genes. Genes with counts per cell smaller than the specified value will not be analyzed. |
mincp |
A positive integer threshold for filtering low-expression genes. A gene will not be analyzed if its number of cells that have a non-zero count is smaller than the specified value . |
covariance |
If TRUE, nebula will output the covariance matrix for the estimated log(FC), which can be used for testing contrasts. |
output_re |
If TRUE, nebula will output the subject-level random effects. Only effective for model='NBGMM' or 'NBLMM'. |
reml |
Either 0 (default) or 1. If it is one, REML will be used to estimate the overdispersions. |
ncore |
The number of cores used for parallel computing. |
fmaxsize |
The maximum allowed total size (in bytes) of global variables (future.globals.maxSize) when using parallel computing. |
Value
summary: The estimated coefficient, standard error and p-value for each predictor.
overdispersion: The estimated cell-level and subject-level overdispersions \sigma^2
and \phi^{-1}
.
convergence: More information about the convergence of the algorithm for each gene. A value of -20 or lower indicates a potential failure of the convergence. A value of one indicates that the convergence is reached due to a sufficiently small improvement of the function value. A value of -10 indicates that the convergence is reached because the gradients are close to zero (i.e., the critical point) and no improvement of the function value can be found.
algorithm: The algorithm used for analyzing the gene. More information can be found in the vignettes.
covariance: The covariance matrix for the estimated log(FC).
random_effect: The subject-level random effects.
Examples
library(nebula)
data(sample_data)
pred = model.matrix(~X1+X2+cc,data=sample_data$pred)
re = nebula(count=sample_data$count,id=sample_data$sid,pred=pred)