R: Train deconvolution model for spatial transcriptomics data

trainDeconvModel {SpatialDDLS}

R Documentation

Train deconvolution model for spatial transcriptomics data

Description

Train a deep neural network model using training data from the SpatialDDLS object. This model will be used to deconvolute spatial transcriptomics data from the same biological context as the single-cell RNA-seq data used to train it. In addition, the trained model is evaluated using test data, and prediction results are obtained to determine its performance (see ?calculateEvalMetrics).

Usage

trainDeconvModel(
  object,
  type.data.train = "mixed",
  type.data.test = "mixed",
  batch.size = 64,
  num.epochs = 60,
  num.hidden.layers = 2,
  num.units = c(200, 200),
  activation.fun = "relu",
  dropout.rate = 0.25,
  loss = "kullback_leibler_divergence",
  metrics = c("accuracy", "mean_absolute_error", "categorical_accuracy"),
  normalize = TRUE,
  scaling = "standardize",
  norm.batch.layers = TRUE,
  custom.model = NULL,
  shuffle = TRUE,
  sc.downsampling = NULL,
  use.generator = FALSE,
  on.the.fly = FALSE,
  agg.function = "AddRawCount",
  threads = 1,
  view.metrics.plot = TRUE,
  verbose = TRUE
)

Arguments

`object`	`SpatialDDLS` object with `single.cell.real`/`single.cell.simul`, `prob.cell.types`, and `mixed.profiles` slots (the last only if `on.the.fly = FALSE`).
`type.data.train`	Type of profiles to be used for training. It can be `'both'`, `'single-cell'` or `'mixed'` (`'mixed'` by default).
`type.data.test`	Type of profiles to be used for evaluation. It can be `'both'`, `'single-cell'` or `'mixed'` (`'mixed'` by default).
`batch.size`	Number of samples per gradient update (64 by default).
`num.epochs`	Number of epochs to train the model (60 by default).
`num.hidden.layers`	Number of hidden layers of the neural network (2 by default). This number must be equal to the length of `num.units` argument.
`num.units`	Vector indicating the number of neurons per hidden layer (`c(200, 200)` by default). The length of this vector must be equal to the `num.hidden.layers` argument.
`activation.fun`	Activation function (`'relu'` by default). See the keras documentation to know available activation functions.
`dropout.rate`	Float between 0 and 1 indicating the fraction of input neurons to be dropped in layer dropouts (0.25 by default). By default, SpatialDDLS implements 1 dropout layer per hidden layer.
`loss`	Character indicating loss function selected for model training (`'kullback_leibler_divergence'` by default). See the keras documentation to know available loss functions.
`metrics`	Vector of metrics used to assess model performance during training and evaluation (`c("accuracy", "mean_absolute_error", "categorical_accuracy")` by default). See the keras documentation to know available performance metrics.
`normalize`	Whether to normalize data using logCPM (`TRUE` by default). This parameter is only considered when the method used to simulate mixed transcriptional profiles (`simMixedProfiles` function) was `"AddRawCount"`. Otherwise, data were already normalized.
`scaling`	How to scale data before training. It can be: `"standardize"` (values are centered around the mean with a unit standard deviation), `"rescale"` (values are shifted and rescaled so that they end up ranging between 0 and 1) or `"none"` (no scaling is performed). `"standardize"` by default.
`norm.batch.layers`	Whether to include batch normalization layers between each hidden dense layer (`TRUE` by default).
`custom.model`	It allows to use a custom neural network architecture. It must be a `keras.engine.sequential.Sequential` object in which the number of input neurons is equal to the number of considered features/genes, and the number of output neurons is equal to the number of cell types considered (`NULL` by default). If provided, the arguments related to the neural network architecture will be ignored.
`shuffle`	Boolean indicating whether data will be shuffled (`TRUE` by default).
`sc.downsampling`	It is only used if `type.data.train` is equal to `'both'` or `'single-cell'`. It allows to set a maximum number of single-cell profiles of a specific cell type for training to avoid an unbalanced representation of classes (`NULL` by default).
`use.generator`	Boolean indicating whether to use generators during training and test. Generators are automatically used when `on.the.fly = TRUE` or HDF5 files are used, but it can be activated by the user on demand (`FALSE` by default).
`on.the.fly`	Boolean indicating whether simulated data will be generated 'on the fly' during training (`FALSE` by default).
`agg.function`	If `on.the.fly == TRUE`, function used to build mixed transcriptional profiles. It may be: `"AddRawCount"` (by default): single-cell profiles (raw counts) are added up across cells. Then, log-CPMs are calculated. `"MeanCPM"`: single-cell profiles (raw counts) are transformed into logCPM and cross-cell averages are calculated. `"AddCPM"`: single-cell profiles (raw counts) are transformed into CPMs and are added up across cells. Then, log-CPMs are calculated.
`threads`	Number of threads used during simulation of mixed transcriptional profiles if `on.the.fly = TRUE` (1 by default).
`view.metrics.plot`	Boolean indicating whether to show plots of loss and evaluation metrics during training (`TRUE` by default). keras for R allows to see model progression during training if you are working in RStudio.
`verbose`	Boolean indicating whether to display model progression during training and model architecture information (`TRUE` by default).

Details

Simulation of mixed transcriptional profiles 'on the fly'

trainDeconvModel can avoid storing simulated mixed spot profiles by using the on.the.fly argument. This functionality aims at reducing the the simMixedProfiles function's memory usage: simulated profiles are built in each batch during training/evaluation.

Neural network architecture

It is possible to change the model's architecture: number of hidden layers, number of neurons for each hidden layer, dropout rate, activation function, and loss function. For more customized models, it is possible to provide a pre-built model through the custom.model argument (a keras.engine.sequential.Sequential object) where it is necessary that the number of input neurons is equal to the number of considered features/genes, and the number of output neurons is equal to the number of considered cell types.

Value

A SpatialDDLS object with trained.model slot containing a DeconvDLModel object. For more information about the structure of this class, see ?DeconvDLModel.

Examples


set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
      rpois(30, lambda = 5), nrow = 15, ncol = 10,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(10)))
    )
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(10)),
    Cell_Type = sample(x = paste0("CellType", seq(2)), size = 10,
                       replace = TRUE)
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
  )
)
SDDLS <- createSpatialDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE
)
SDDLS <- genMixedCellProp(
  object = SDDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  num.sim.spots = 50,
  train.freq.cells = 2/3,
  train.freq.spots = 2/3,
  verbose = TRUE
)
SDDLS <- simMixedProfiles(SDDLS)
SDDLS <- trainDeconvModel(
  object = SDDLS,
  batch.size = 12,
  num.epochs = 5
)

[Package SpatialDDLS version 1.0.2 Index]