pruneKnn {RaceID} | R Documentation |
Function inferring a pruned knn matrix
Description
This function determines k nearest neighbours for each cell in gene expression space, and tests if the links are supported by a negative binomial joint distribution of gene expression. A probability is assigned to each link which is given by the minimum joint probability across all genes.
Usage
pruneKnn(
expData,
distM = NULL,
large = TRUE,
regNB = TRUE,
bmethod = NULL,
batch = NULL,
regVar = NULL,
offsetModel = TRUE,
thetaML = FALSE,
theta = 10,
ngenes = 2000,
span = 0.75,
pcaComp = NULL,
tol = 1e-05,
algorithm = "kd_tree",
metric = "pearson",
genes = NULL,
knn = 25,
do.prune = TRUE,
alpha = 1,
nb = 3,
no_cores = NULL,
FSelect = FALSE,
pca.scale = FALSE,
ps = 1,
seed = 12345,
...
)
Arguments
expData |
Matrix of gene expression values with genes as rows and cells as columns. These values have to correspond to unique molecular identifier counts. Alternatively, a Seurat object could be used as input, after normalization, PCA-dimensional reduction, and shared-nearest neighbour inference. |
distM |
Optional distance matrix used for determining k nearest neighbours. Default is |
large |
logical. If |
regNB |
logical. If |
bmethod |
Character string indicating the batch correction method. If "harmony", then batch correction is performed by the harmony package. Default is |
batch |
vector of batch variables. Component names need to correspond to valid cell IDs, i.e. column names of |
regVar |
data.frame with additional variables to be regressed out simultaneously with the log UMI count and the batch variable (if |
offsetModel |
Logical parameter. Only considered if |
thetaML |
Logical parameter. Only considered if |
theta |
Positive real number. Fixed value of the dispersion parameter. Only considered if |
ngenes |
Positive integer number. Randomly sampled number of genes (from rownames of |
span |
Positive real number. Parameter for loess-regression (see |
pcaComp |
Positive integer number. Number of princple components to be included if |
tol |
Numerical value greater than zero. Tolerance for numerical PCA using irlba. Default value is 1e-6. |
algorithm |
Algorithm for fast k nearest neighbour inference, using the |
metric |
Distances are computed from the expression matrix |
genes |
Vector of gene names corresponding to a subset of rownames of |
knn |
Positive integer number. Number of nearest neighbours considered for each cell. Default is 25. |
do.prune |
Logical parameter. If |
alpha |
Positive real number. Relative weight of a cell versus its k nearest neigbour applied for the derivation of joint probabilities. A cell receives a weight of |
nb |
Positive integer number. Number of genes with the lowest outlier probability included for calculating the link probabilities for the knn pruning. The link probability is computed as the geometric mean across these genes. Default is 3. |
no_cores |
Positive integer number. Number of cores for multithreading. If set to |
FSelect |
Logical parameter. If |
pca.scale |
Logical parameter. If |
ps |
Real number greater or equal to zero. Pseudocount to be added to counts within local neighbourhoods for outlier identification and pruning. Default is 1. |
seed |
Integer number. Random number to initialize stochastic routines. Default is 12345. |
... |
Additional parameters for |
Value
List object of six components:
distM |
Distance matrix. |
dimRed |
PCA transformation of |
pvM |
Matrix of link probabilities between a cell and each of its k nearest neighbours (Bonferroni-corrected p-values). Column |
pvM.raw |
Matrix of uncorrected link probabilities between a cell and each of its k nearest neighbours (without multiple-testing correction). Column |
NN |
Matrix of column indices of k nearest neighbours for each cell according to input matrix |
B |
List object with background model of gene expression as obtained by |
regData |
If |
alpha |
Vector of inferred values for the |
pars |
List object storing the run parameters. |
pca |
Principal component analysis of the of the input data, if |
Examples
res <- pruneKnn(intestinalDataSmall,knn=10,alpha=1,no_cores=1,FSelect=FALSE)