R: Function for computing local gene expression averages

compMean {RaceID}

R Documentation

Function for computing local gene expression averages

Description

This function performs computation of locally averaged gene expression across the pruned k nearest neighbours at given link probability cutoff.

Usage

compMean(
  x,
  res,
  pvalue = 0.01,
  genes = NULL,
  regNB = FALSE,
  batch = NULL,
  regVar = NULL,
  offsetModel = TRUE,
  thetaML = FALSE,
  theta = 10,
  ngenes = NULL,
  span = 0.75,
  no_cores = NULL,
  seed = 12345
)

Arguments

`x`	Matrix of gene expression values with genes as rows and cells as columns. The matrix need to contain the same cell IDs as columns like the input matrix used to derive the pruned k nearest neighbours with the `pruneKnn` function. However, it may contain a different set of genes.
`res`	List object with k nearest neighbour information returned by `pruneKnn` function.
`pvalue`	Positive real number between 0 and 1. All nearest neighbours with link probability `< pvalue` are discarded. Default is 0.01.
`genes`	Vector of gene names corresponding to a subset of rownames of `x`. Only for these genes local gene expression averages are computed. Default is `NULL` and values for all genes are returned.
`regNB`	logical. If `TRUE` then gene expression averages are computed from the pearson residuals obtained from a negative binomial regression to eliminate the dependence of the expression variance on the mean. If `FALSE` then averages are computed from raw UMI counts. Default is `FALSE`.
`batch`	vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of `expData`. If `regNB` is `TRUE`, than the batch variable will be regressed out simultaneously with the log UMI count per cell.An interaction term is included for the log UMI count with the batch variable. Default value is `NULL`.
`regVar`	data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if `batch` is `TRUE`). Column names indicate variable names (name `beta` is reserved for the coefficient of the log UMI count), and rownames need to correspond to valid cell IDs, i.e. column names of `expData`. Interaction terms are included for each variable in `regVar` with the batch variable (if `batch` is `TRUE`). Default value is `NULL`.
`offsetModel`	Logical parameter. Only considered if `regNB` is `TRUE`. If `TRUE` then the `beta` (log UMI count) coefficient is set to 1 and the intercept is computed analytically as the log ration of UMI counts for a gene and the total UMI count across all cells. Batch variables and additional variables in `regVar` are regressed out with an offset term given by the sum of the intercept and the log UMI count. Default is `TRUE`.
`thetaML`	Logical parameter. Only considered if `offsetModel` equals `TRUE`. If `TRUE` then the dispersion parameter is estimated by a maximum likelihood fit. Otherwise, it is set to `theta`. Default is `FALSE`.
`theta`	Positive real number. Fixed value of the dispersion parameter. Only considered if `theaML` equals `FALSE`.
`ngenes`	Positive integer number. Randomly sampled number of genes (from rownames of `expData`) used for predicting regression coefficients (if `regNB=TRUE`). Smoothed coefficients are derived for all genes. Default is `NULL` and all genes are used.
`span`	Positive real number. Parameter for loess-regression (see `regNB`) controlling the degree of smoothing. Default is 0.75.
`no_cores`	Positive integer number. Number of cores for multithreading. If set to `NULL` then the number of available cores minus two is used. Default is `NULL`.
`seed`	Integer number. Random number to initialize stochastic routines. Default is 12345.

Value

List object of three components:

mean

matrix with local gene expression averages, computed from Pearson residuals (if regNB=TRUE) or normalized UMI counts (if regNB=FALSE). In the latter case, the average UMI count for a local neighbourhood is normalized to one and rescaled by the median UMI count across neighborhoods.

regData

If regNB=TRUE this argument contains a list of four components: component pearsonRes contains a matrix of the Pearson Residual computed from the negative binomial regression, component nbRegr contains a matrix with the regression coefficients, component nbRegrSmooth contains a matrix with the smoothed regression coefficients, and log_umi is a vector with the total log UMI count for each cell. The regression coefficients comprise the dispersion parameter theta, the intercept, the regression coefficient beta for the log UMI count, and the regression coefficients of the batches (if batch is not NULL).

Examples

res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)
mexp <- compMean(intestinalDataSmall,res,pvalue=0.01,genes = NULL,no_cores=1)

[Package RaceID version 0.3.5 Index]