ReferenceBasedDecomposition {BisqueRNA}R Documentation

Performs reference-based decomposition of bulk expression using single-cell data

Description

Generates a reference profile based on single-cell data. Learns a transformation of bulk expression based on observed single-cell proportions and performs NNLS regression on these transformed values to estimate cell proportions.

Usage

ReferenceBasedDecomposition(
  bulk.eset,
  sc.eset,
  markers = NULL,
  cell.types = "cellType",
  subject.names = "SubjectName",
  use.overlap = TRUE,
  verbose = TRUE,
  old.cpm = TRUE
)

Arguments

bulk.eset

Expression Set containin bulk data. No PhenoData required but if overlapping option used, IDs returned by sampleNames(bulk.eset) should match those found in sc.eset phenoData individual labels.

sc.eset

Expression Set containing single-cell data. PhenoData of this Expression Set should contain cell type and individual labels for each cell. Names of these fields specified by arguments below.

markers

Structure, such as character vector, containing marker genes to be used in decomposition. 'base::unique(base::unlist(markers))' should return a simple vector containing each gene name. If no argument or NULL provided, the method will use all available genes for decomposition.

cell.types

Character string. Name of phenoData attribute in sc.eset indicating cell type label for each cell

subject.names

Character string. Name of phenoData attribute in sc.eset indicating individual label for each cell

use.overlap

Boolean. Whether to use and expect overlapping samples in decomposition.

verbose

Boolean. Whether to print log info during decomposition. Errors will be printed regardless.

old.cpm

Prior to version 1.0.4 (updated in July 2020), the package converted counts to CPM after subsetting the marker genes. Github user randel pointed out that the order of these operations should be switched. Thanks randel! This option is provided for replication of older BisqueRNA but should be enabled, especially for small marker gene sets. We briefly tested this change on the cortex and adipose datasets. The original and new order of operations produce estimates that have an average correlation of 0.87 for the cortex and 0.84 for the adipose within each cell type.

Details

Expects read counts for both datasets, as they will be converted to counts per million (CPM). Two options available: Use overlapping indivudals found in both single-cell and bulk datasets to learn transformation or learn transformation from single-cell alone. The overlapping option is expected to have better performance.

Value

A list. Slot bulk.props contains a matrix of cell type proportion estimates with cell types as rows and individuals as columns. Slot sc.props contains a matrix of cell type proportions estimated directly from counting single-cell data. Slot rnorm contains Euclidean norm of the residuals for each individual's proportion estimates. Slot genes.used contains vector of genes used in decomposition. Slot transformed.bulk contains the transformed bulk expression used for decomposition. These values are generated by applying a linear transformation to the CPM expression.

Examples

library(Biobase)
sim.data <- SimulateData(n.ind=10, n.genes=100, n.cells=100,
                         cell.types=c("Neurons", "Astrocytes", "Microglia"),
                         avg.props=c(.5, .3, .2))
sim.data$sc.eset <- sim.data$sc.eset[,sim.data$sc.eset$SubjectName %in% as.character(6:10)]
res <- ReferenceBasedDecomposition(sim.data$bulk.eset, sim.data$sc.eset)
estimated.cell.proportions <- res$bulk.props


[Package BisqueRNA version 1.0.5 Index]