classify {cellpypes} | R Documentation |
Classify cells on previously defined rules
Description
Classify cells on previously defined rules
Usage
classify(
obj,
classes = NULL,
knn_refine = 0,
replace_overlap_with = "Unassigned",
return_logical_matrix = FALSE,
overdispersion = 0.01
)
Arguments
obj |
A cellpypes object, see section cellpypes Objects below. |
classes |
Character vector with one or more class names. If NULL (the default), plots finest available cell types (all classes that are not parent of any other class). |
knn_refine |
Numeric between 0 and 1. If 0, do not refine labels obtained from UMI count pooling. If larger than 0 (recommended: 0.1), cellpypes will try to label unassigned cells by majority vote, see section knn_refine below. |
replace_overlap_with |
Character string, by default: |
return_logical_matrix |
logical. If TRUE,
a logical matrix with
classes in columns and cells in rows is returned instead of resolving
overlaps with |
overdispersion |
Defaults to 0.01, only change it if you know
what you are doing.
If set to 0, the NB simplifies to the Poisson distribution, and larger
values give more variance.
The 0.01 default value follows the recommendation by
Lause, Berens and Kobak (Genome Biology 2021) to use
|
Value
A factor with cell type labels.
cellpypes Objects
A cellpypes object is a list with four slots:
raw
(sparse) matrix with genes in rows, cells in columns
totalUMI
the colSums of obj$raw
embed
two-dimensional embedding of the cells, provided as data.frame or tibble with two columns and one row per cell.
neighbors
index matrix with one row per cell and k columns, where k is the number of nearest neighbors (we recommend 15<k<100, e.g. k=50). Here are two ways to get the neighbors index matrix:
Use
find_knn(featureMatrix)$idx
, where featureMatrix could be principal components, latent variables or normalized genes (features in rows, cells in columns).use
as(seurat@graphs[["RNA_nn"]], "dgCMatrix")> .1
to extract the kNN graph computed on RNA. The> .1
ensures this also works with RNA_snn, wknn/wsnn or any other available graph – check withnames(seurat@graphs)
.
Handling overlap
Overlap denotes all cells
for which rules from multiple classes apply, and these cells will be
labeled as Unassigned
by default.
If you are in fact interested in where the overlap is,
set return_logical_matrix
=TRUE and inspect the result.
Note that
it matters whether you call classify("Tcell")
or
classify(c("Tcell","Bcell")
– any existing overlap between T and B cells
is labelled as Unassigned
in
this second call, but not in the first.
Replacing overlap happens only between mutually exclusive labels (such as Tcell and Bcell), but not within a lineage. To make an example, overlap is NOT replaced between child (PD1+Ttox) and parent (Ttox) or any other ancestor (Tcell), but instead the most detailed cell type (PD1+Ttox) is returned.
All of the above is also true for plot_classes
, as it wraps classify
.
knn_refine
With knn_refine > 0
, cellpypes refines cell type labels with a kNN classifier.
By default, cellpypes only assigns cells to a class if all relevant rules apply. In other words, all marker gene UMI counts in the cell's neighborhood all have to be clearly above/below their threshold. Since UMI counts are sparse (even after neighbor pooling done by cellpypes), this can leave many cells unassigned.
It is reasonable to assume an unassigned cell is of the same cell type as the
majority of its nearest neighbors.
Therefore, cellpypes implements a kNN classifier to further refine labels
obtained by
manually thresholding UMI counts.
knn_refine = 0.3
means a cell is assigned the class label held by
most of its neighbors unless no class gets more than 30 %.
If most neighbors are unassigned, the cell will also be set to "Unassigned".
Choosing knn_refine = 0.3
gives results reminiscent of clustering
(which assigns all cells),
while knn_refine = 0.5
leaves cells 'in between' two similar
cell types unassigned.
We recommend looking at knn_refine = 0
first as it's faster and
more directly tied to marker gene expression.
If assigning all cells is desired, we recommend knn_refine = 0.3
or lower,
while knn_refine = 0.5
makes cell types more 'crisp' by setting cells
'in between' related subtypes to "Unassigned".
Examples
classify(rule(simulated_umis, "Tcell", "CD3E", ">", 1))