R: Function for computing local gene expression variability

compNoise {RaceID}

R Documentation

Function for computing local gene expression variability

Description

This function performs computation of the local gene expression variability across the pruned k nearest neighbours at given link probability cutoff. The estimated variance is corrected for the mean dependence utilizing the baseline model of gene expression variance.

Usage

compNoise(
  x,
  res,
  pvalue = 0.01,
  genes = NULL,
  regNB = FALSE,
  batch = NULL,
  regVar = NULL,
  offsetModel = TRUE,
  thetaML = FALSE,
  theta = 10,
  ngenes = NULL,
  span = 0.75,
  step = 0.01,
  thr = 0.05,
  no_cores = NULL,
  seed = 12345
)

Arguments

`x`	Matrix of gene expression values with genes as rows and cells as columns. The matrix need to contain the same cell IDs as columns like the input matrix used to derive the pruned k nearest neighbours with the `pruneKnn` function. However, it may contain a different set of genes.
`res`	List object with k nearest neighbour information returned by `pruneKnn` function.
`pvalue`	Positive real number between 0 and 1. All nearest neighbours with link probability `< pvalue` are discarded. Default is 0.01.
`genes`	Vector of gene names corresponding to a subset of rownames of `x`. Only for these genes local gene expression variability is computed. Default is `NULL` and values for all genes are returned.
`regNB`	logical. If `TRUE` then gene expression variability is derived from the pearson residuals obtained from a negative binomial regression to eliminate the dependence of the expression variance on the mean. If `FALSE` then the mean dependence is regressed out from the raw variance using the baseline variance estimate. Default is `FALSE`.
`batch`	vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of `expData`. If `regNB` is `TRUE`, than the batch variable will be regressed out simultaneously with the log UMI count per cell. An interaction term is included for the log UMI count with the batch variable. Default value is `NULL`.
`regVar`	data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if `batch` is `TRUE`). Column names indicate variable names (name `beta` is reserved for the coefficient of the log UMI count), and rownames need to correspond to valid cell IDs, i.e. column names of `expData`. Interaction terms are included for each variable in `regVar` with the batch variable (if `batch` is `TRUE`). Default value is `NULL`.
`offsetModel`	Logical parameter. Only considered if `regNB` is `TRUE`. If `TRUE` then the `beta` (log UMI count) coefficient is set to 1 and the intercept is computed analytically as the log ration of UMI counts for a gene and the total UMI count across all cells. Batch variables and additional variables in `regVar` are regressed out with an offset term given by the sum of the intercept and the log UMI count. Default is `TRUE`.
`thetaML`	Logical parameter. Only considered if `offsetModel` equals `TRUE`. If `TRUE` then the dispersion parameter is estimated by a maximum likelihood fit. Otherwise, it is set to `theta`. Default is `FALSE`.
`theta`	Positive real number. Fixed value of the dispersion parameter. Only considered if `theaML` equals `FALSE`.
`ngenes`	Positive integer number. Randomly sampled number of genes (from rownames of `expData`) used for predicting regression coefficients (if `regNB=TRUE`). Smoothed coefficients are derived for all genes. Default is `NULL` and all genes are used.
`span`	Positive real number. Parameter for loess-regression (see `regNB`) controlling the degree of smoothing. Default is 0.75.
`step`	Positive real number between 0 and 1. See function `noiseBaseFit`. Default is 0.01.
`thr`	Positive real number between 0 and 1. See function `noiseBaseFit`. Default is 0.05.
`no_cores`	Positive integer number. Number of cores for multithreading. If set to `NULL` then the number of available cores minus two is used. Default is `NULL`.
`seed`	Integer number. Random number to initialize stochastic routines. Default is 12345.

Value

List object of three components:

`model`	the baseline noise model as computed by the `noiseBaseFit` function.
`data`	matrix with local gene expression variability estimates, corrected for the mean dependence.
`regData`	If `regNB=TRUE` this argument contains a list of four components: component `pearsonRes` contains a matrix of the Pearson Residual computed from the negative binomial regression, component `nbRegr` contains a matrix with the regression coefficients, component `nbRegrSmooth` contains a matrix with the smoothed regression coefficients, and `log_umi` is a vector with the total log UMI count for each cell. The regression coefficients comprise the dispersion parameter theta, the intercept, the regression coefficient beta for the log UMI count, and the regression coefficients of the batches (if `batch` is not `NULL`).

Examples

res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
noise <- compNoise(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)

[Package RaceID version 0.3.5 Index]