R: KNN-based imputation

KSoftImpute {SignacX}

R Documentation

KNN-based imputation

Description

KSoftImpute is an ultra-fast method for imputing missing gene expression values in single cell data. KSoftImpute uses k-nearest neighbors to impute the expression of each gene by the weighted average of itself and it's first-degree neighbors. Weights for imputation are determined by the number of detected genes. This method works for large data sets (>100,000 cells) in under a minute.

Usage

KSoftImpute(E, dM = NULL, genes.to.use = NULL, verbose = FALSE)

Arguments

`E`	A gene-by-sample count matrix (sparse matrix or matrix) with genes identified by their HUGO symbols.
`dM`	see ?CID.GetDistMat
`genes.to.use`	a character vector of genes to impute. Default is NULL.
`verbose`	If TRUE, code reports outputs. Default is FALSE.

Value

An expression matrix (sparse matrix) with imputed values.

Examples

## Not run: 
# download single cell data for classification
file.dir = "https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/"
file = "pbmc_1k_v3_filtered_feature_bc_matrix.h5"
download.file(paste0(file.dir, file), "Ex.h5")

# load data, process with Seurat
library(Seurat)
E = Read10X_h5(filename = "Ex.h5")
pbmc <- CreateSeuratObject(counts = E, project = "pbmc")

# run Seurat pipeline
pbmc <- SCTransform(pbmc, verbose = FALSE)
pbmc <- RunPCA(pbmc, verbose = FALSE)
pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = FALSE)
pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = FALSE)

# get edges from default assay from Seurat object
default.assay <- Seurat::DefaultAssay(pbmc)
edges = pbmc@graphs[[which(grepl(paste0(default.assay, "_nn"), names(pbmc@graphs)))]]

# get distance matrix
dM = CID.GetDistMat(edges)

# run imputation
Z = KSoftImpute(E = E, dM = dM, verbose = TRUE)

## End(Not run)