scGate {scGate}R Documentation

Filter single-cell data by cell type

Description

Apply scGate to filter specific cell types in a query dataset

Usage

scGate(
  data,
  model,
  pos.thr = 0.2,
  neg.thr = 0.2,
  assay = NULL,
  slot = "data",
  ncores = 1,
  BPPARAM = NULL,
  seed = 123,
  keep.ranks = FALSE,
  reduction = c("calculate", "pca", "umap", "harmony"),
  min.cells = 30,
  nfeatures = 2000,
  pca.dim = 30,
  param_decay = 0.25,
  maxRank = 1500,
  output.col.name = "is.pure",
  k.param = 30,
  smooth.decay = 0.1,
  smooth.up.only = FALSE,
  genes.blacklist = "default",
  return.CellOntology = TRUE,
  multi.asNA = FALSE,
  additional.signatures = NULL,
  save.levels = FALSE,
  verbose = FALSE,
  progressbar = T
)

Arguments

data

Seurat object containing a query data set - filtering will be applied to this object

model

A single scGate model, or a list of scGate models. See Details for this format

pos.thr

Minimum UCell score value for positive signatures

neg.thr

Maximum UCell score value for negative signatures

assay

Seurat assay to use

slot

Data slot in Seurat object to calculate UCell scores

ncores

Number of processors for parallel processing

BPPARAM

A [BiocParallel::bpparam()] object that tells scGate how to parallelize. If provided, it overrides the 'ncores' parameter.

seed

Integer seed for random number generator

keep.ranks

Store UCell rankings in Seurat object. This will speed up calculations if the same object is applied again with new signatures.

reduction

Dimensionality reduction to use for knn smoothing. By default, calculates a new reduction based on the given assay; otherwise you may specify a precalculated dimensionality reduction (e.g. in the case of an integrated dataset after batch-effect correction)

min.cells

Minimum number of cells to cluster or define cell types

nfeatures

Number of variable genes for dimensionality reduction

pca.dim

Number of principal components for dimensionality reduction

param_decay

Controls decrease in parameter complexity at each iteration, between 0 and 1. param_decay == 0 gives no decay, increasingly higher param_decay gives increasingly stronger decay

maxRank

Maximum number of genes that UCell will rank per cell

output.col.name

Column name with 'pure/impure' annotation

k.param

Number of nearest neighbors for knn smoothing

smooth.decay

Decay parameter for knn weights: (1-decay)^n

smooth.up.only

If TRUE, only let smoothing increase signature scores

genes.blacklist

Genes blacklisted from variable features. The default loads the list of genes in scGate::genes.blacklist.default; you may deactivate blacklisting by setting genes.blacklist=NULL

return.CellOntology

If TRUE Cell ontology name and id are returned as additional metadata columns when running multiple models.

multi.asNA

How to label cells that are "Pure" for multiple annotations: "Multi" (FALSE) or NA (TRUE)

additional.signatures

A list of additional signatures, not included in the model, to be evaluated (e.g. a cycling signature). The scores for this list of signatures will be returned but not used for filtering.

save.levels

Whether to save in metadata the filtering output for each gating model level

verbose

Verbose output

progressbar

Whether to show a progressbar or not

Details

Models for scGate are data frames where each line is a signature for a given filtering level. A database of models can be downloaded using the function get_scGateDB. You may directly use the models from the database, or edit one of these models to generate your own custom gating model.

Multiple models can also be evaluated at once, by running scGate with a list of models. Gating for each individual model is returned as metadata, with a consensus annotation stored in scGate_multi metadata field. This allows using scGate as a multi-class classifier, where only cells that are "Pure" for a single model are assigned a label, cells that are "Pure" for more than one gating model are labeled as "Multi", all others cells are annotated as NA.

Value

A new metadata column is.pure is added to the query Seurat object, indicating which cells passed the scGate filter. The active.ident is also set to this variable.

See Also

load_scGate_model get_scGateDB plot_tree

Examples


### Test using a small toy set
data(query.seurat)
# Define basic gating model for B cells
my_scGate_model <- gating_model(name = "Bcell", signature = c("MS4A1")) 
query.seurat <- scGate(query.seurat, model = my_scGate_model, reduction="pca")
table(query.seurat$is.pure)
### Test with larger datasets
library(Seurat)
testing.datasets <- get_testing_data(version = 'hsa.latest')
seurat_object <- testing.datasets[["JerbyArnon"]]
# Download pre-defined models
models <- get_scGateDB()
seurat_object <- scGate(seurat_object, model=models$human$generic$PanBcell)
DimPlot(seurat_object)
seurat_object_filtered <- subset(seurat_object, subset=is.pure=="Pure")

### Run multiple models at once
models <- get_scGateDB()
model.list <- list("Bcell" = models$human$generic$Bcell,
                   "Tcell" = models$human$generic$Tcell)
seurat_object <- scGate(seurat_object, model=model.list)
DimPlot(seurat_object, group.by = "scGate_multi")


[Package scGate version 1.6.2 Index]