R: Deconvolute bulk RNA-Seq samples using a pre-trained...

deconvDigitalDLSorter {digitalDLSorteR}

R Documentation

Deconvolute bulk RNA-Seq samples using a pre-trained DigitalDLSorter model

Description

Deconvolute bulk gene expression samples (bulk RNA-Seq) to enumerate and quantify the proportion of cell types present in a bulk sample using Deep Neural Network models. This function is intended for users who want to use pre-trained models integrated in the package. So far, the available models allow to deconvolute the immune infiltration of breast cancer (using data from Chung et al., 2017) and the immune infiltration of colorectal cancer (using data from Li et al., 2017) samples. For the former, two models are available at two different levels of specificity: specific cell types (breast.chung.specific) and generic cell types (breast.chung.generic). See breast.chung.generic, breast.chung.specific, and colorectal.li documentation from the digitalDLSorteRdata package for more details.

Usage

deconvDigitalDLSorter(
  data,
  model = NULL,
  normalize = TRUE,
  scaling = "standardize",
  simplify.set = NULL,
  simplify.majority = NULL,
  use.generator = FALSE,
  batch.size = 64,
  verbose = TRUE
)

Arguments

`data`	Matrix or data frame with bulk RNA-Seq samples with genes as rows in SYMBOL notation and samples as columns.
`model`	Pre-trained DNN model to use to deconvolute `data`. Up to now, the available models are intended to deconvolute samples from breast cancer (`breast.chung.generic` and `breast.chung.specific`) and colorectal cancer (`colorectal.li`). These pre-trained models are stored in the digitalDLSorteRdata package, so it must be installed together with digitalDLSorteR to use this function.
`normalize`	Normalize data before deconvolution (`TRUE` by default).
`scaling`	How to scale data before training. It may be: `"standardize"` (values are centered around the mean with a unit standard deviation) or `"rescale"` (values are shifted and rescaled so that they end up ranging between 0 and 1). If `normalize = FALSE`, data is not scaled.
`simplify.set`	List specifying which cell types should be compressed into a new label whose name will be the list name item. See examples and vignettes for details.
`simplify.majority`	List specifying which cell types should be compressed into the cell type with the highest proportion in each sample. Unlike `simplify.set`, this argument allows to maintain the complexity of the results while compressing the information, as no new labels are created.
`use.generator`	Boolean indicating whether to use generators for prediction (`FALSE` by default).
`batch.size`	Number of samples per batch. Only when `use.generator = TRUE`.
`verbose`	Show informative messages during execution.

Details

This function is intended for users who want to use digitalDLSorteR to deconvolute their bulk RNA-Seq samples using pre-trained models. For users who want to build their own models from other scRNA-Seq datasets, see the createDDLSobject and deconvDDLSObj functions.

Value

A data frame with samples (i) as rows and cell types (j) as columns. Each entry represents the predicted proportion of cell type j in sample i.

References

Chung, W., Eum, H. H., Lee, H. O., Lee, K. M., Lee, H. B., Kim, K. T., et al. (2017). Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat. Commun. 8 (1), 15081. doi: doi:10.1038/ncomms15081.

Examples

## Not run: 
set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
      rpois(30, lambda = 5), nrow = 15, ncol = 20,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
    )
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(20)),
    Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
                       replace = TRUE)
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
  )
)
DDLS <- createDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE, 
  sc.log.FC = FALSE
)
probMatrixValid <- data.frame(
  Cell_Type = paste0("CellType", seq(6)),
  from = c(1, 1, 1, 15, 15, 30),
  to = c(15, 15, 30, 50, 50, 70)
)
DDLS <- generateBulkCellMatrix(
  object = DDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  prob.design = probMatrixValid,
  num.bulk.samples = 50,
  verbose = TRUE
)
# training of DDLS model
tensorflow::tf$compat$v1$disable_eager_execution()
DDLS <- trainDDLSModel(
  object = DDLS,
  on.the.fly = TRUE,
  batch.size = 15,
  num.epochs = 5
)
# simulating bulk RNA-Seq data
countsBulk <- matrix(
  stats::rpois(100, lambda = sample(seq(4, 10), size = 100, replace = TRUE)),
  nrow = 40, ncol = 15,
  dimnames = list(paste0("Gene", seq(40)), paste0("Bulk", seq(15)))
)
# this is only an example. See vignettes to see how to use pre-trained models
# from the digitalDLSorteRmodels data package
results1 <- deconvDigitalDLSorter(
  data = countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE
)
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
                 CellGroup2 = c("CellType3", "CellType5"))
# in this case the names of the list will be the new labels
results2 <- deconvDigitalDLSorter(
  countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE,
  simplify.set = simplify
)
# in this case the cell type with the highest proportion will be the new label
results3 <- deconvDigitalDLSorter(
  countsBulk,
  model = trained.model(DDLS),
  normalize = TRUE,
  simplify.majority = simplify
)

## End(Not run)

[Package digitalDLSorteR version 1.0.1 Index]