R: Deconvolute spatial transcriptomics data using trained model

deconvSpatialDDLS {SpatialDDLS}

R Documentation

Deconvolute spatial transcriptomics data using trained model

Description

Deconvolute spatial transcriptomics data using the trained model in the SpatialDDLS object. The trained model is used to predict cell proportions of two mirrored transcriptional profiles:

'Intrinsic' profiles: transcriptional profiles of each spot in the ST dataset.
'Extrinsic' profiles: profiles simulated from the surrounding spots of each spot.

After prediction, cell proportions from the intrinsic profiles (intrinsic cell proportions) are regularized based on the similarity between intrinsic and extrinsic profiles in order to maintain spatial consistency. This approach leverages both transcriptional and spatial information. For more details, see Mañanes et al., 2023 and the Details section.

Usage

deconvSpatialDDLS(
  object,
  index.st,
  normalize = TRUE,
  scaling = "standardize",
  k.spots = 4,
  pca.space = TRUE,
  fast.pca = TRUE,
  pcs.num = 50,
  pca.var = 0.8,
  metric = "euclidean",
  alpha.cutoff = "mean",
  alpha.quantile = 0.5,
  simplify.set = NULL,
  simplify.majority = NULL,
  use.generator = FALSE,
  batch.size = 64,
  verbose = TRUE
)

Arguments

`object`	`SpatialDDLS` object with `trained.model` and `spatial.experiments` slots.
`index.st`	Name or index of the dataset/slide stored in the `SpatialDDLS` object (`spatial.experiments` slot) to be deconvolute. If missing, all datasets will be deconvoluted.
`normalize`	Normalize data (logCPM) before deconvolution (`TRUE` by default).
`scaling`	How to scale data before training. Options include `"standardize"` (values are centered around the mean with a unit standard deviation) or `"rescale"` (values are shifted and rescaled so that they end up ranging between 0 and 1). If `normalize = FALSE`, data are not scaled.
`k.spots`	Number of nearest spots considered for each spot during regularization and simulation of extrinsic transcriptional profiles. The greater, the smoother the regularization will be (4 by default).
`pca.space`	Whether to use PCA space to calculate distances between intrinsic and extrinsic transcriptional profiles (`TRUE` by default).
`fast.pca`	Whether using the irlba implementation. If `TRUE`, the number of PCs used is defined by the parameter. If `FALSE`, the PCA implementation from the stats R package is used instead (`TRUE` by default).
`pcs.num`	Number of PCs used to calculate distances if `fast.pca == TRUE` (50 by default).
`pca.var`	Threshold of explained variance (between 0.2 and 1) used to choose the number of PCs used if `pca.space == TRUE` and `fast.pca == FALSE` (0.8 by default).
`metric`	Metric used to measure distance/similarity between intrinsic and extrinsic transcriptional profiles. It may be `'euclidean'`, `'cosine'` or `'pearson'` (`'euclidean'` by default).
`alpha.cutoff`	Minimum distance for regularization. It may be `'mean'` (spots with transcriptional distances shorter than the mean distance of the dataset will be modified) or `'quantile'` (spots with transcriptional distances shorter than the `alpha.quantile` quantile are used). `'mean'` by default.
`alpha.quantile`	Quantile used if `alpha.cutoff == 'quantile'`. 0.5 by default.
`simplify.set`	List specifying which cell types should be compressed into a new label with the name of the list item. See examples for details. If provided, results are stored in a list with `'raw'` and `'simpli.set'` elements.
`simplify.majority`	List specifying which cell types should be compressed into the cell type with the highest proportion in each spot. Unlike `simplify.set`, no new labels are created. If provided, results are stored in a list with `'raw'` and `'simpli.majority'` elements.
`use.generator`	Boolean indicating whether to use generators for prediction (`FALSE` by default).
`batch.size`	Number of samples per batch. Only when `use.generator = TRUE`.
`verbose`	Show informative messages during the execution.

Details

The deconvolution process involves two main steps: predicting cell proportions based on transcriptome using the trained neural network model, and regularization of cell proportions based on the spatial location of each spot. In the regularization step, a mirrored version of each spot is simulated based on its N-nearest spots. We refer to these profiles as 'extrinsic' profiles, whereas the transcriptional profiles of each spot are called 'intrinsic' profiles. Extrinsic profiles are used to regularize predictions based on intrinsic profiles. The rationale is that spots surrounded by transcriptionally similar spots should have similar cell compositions, and therefore predicted proportions can be smoothed to preserve their spatial consistency. On the other hand, spots surrounded by dissimilar spots cannot be predicted by their neighbors, and thus they can only be predicted by their own transcriptional profiles likely due to presenting very specific cell compositions.

Regarding the working os SpatialDDLS: first, extrinsic profiles are simulated based on the N-nearest spots for each spot by summing their transcriptomes. Distances between extrinsic and intrinsic profiles of each spot are calculated so that similar/dissimilar spots are identified. These two sets of transcriptional profiles are used as input for the trained neural network model, and according to the calculated distances, a weighted mean between the predicted proportions for each spot is calculated. Spots with distances between intrinsic and extrinsic profiles greater than alpha.cutoff are not regularized, whereas spots with distances less than alpha.cutoff contribute to the weighted mean. Weights are calculated by rescaling distances less than alpha.cutoff between 0 and 0.5, so that the maximum extent to which a extrinsic profile can modified the predictions based on intrinsic profiles is 0.5 (a regular mean). For more details, see Mañanes et al., 2023.

This function requires a SpatialDDLS object with a trained deep neural network model (trained.model slot, and the spatial transcriptomics datasets to be deconvoluted in the spatial.experiments slot. See ?createSpatialDDLSobject or ?loadSTProfiles for more details.

Value

SpatialDDLS object with a deconv.spots slot. The output is a list containing 'Regularized', 'Intrinsic' and 'Extrinsic' deconvoluted cell proportions, 'Distances' between intrinsic and extrinsic transcriptional profiles, and 'Weight.factors' with the final weights used to regularize intrinsic cell proportions. If simplify.set and/or simplify.majority are provided, the deconv.spots slot will contain a list with raw and simplified results.

References

Mañanes, D., Rivero-García, I., Jimenez-Carretero, D., Torres, M., Sancho, D., Torroja, C., Sánchez-Cabo, F. (2023). SpatialDDLS: An R package to deconvolute spatial transcriptomics data using neural networks. biorxiv. doi: doi:10.1101/2023.08.31.555677.

Examples


set.seed(123)
sce <- SingleCellExperiment::SingleCellExperiment(
  assays = list(
    counts = matrix(
     rpois(30, lambda = 5), nrow = 15, ncol = 20,
      dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(20)))
    )
  ),
  colData = data.frame(
    Cell_ID = paste0("RHC", seq(20)),
    Cell_Type = sample(x = paste0("CellType", seq(6)), size = 20,
                       replace = TRUE)
  ),
  rowData = data.frame(
    Gene_ID = paste0("Gene", seq(15))
  )
)
SDDLS <- createSpatialDDLSobject(
  sc.data = sce,
  sc.cell.ID.column = "Cell_ID",
  sc.gene.ID.column = "Gene_ID",
  sc.filt.genes.cluster = FALSE
)
SDDLS <- genMixedCellProp(
  object = SDDLS,
  cell.ID.column = "Cell_ID",
  cell.type.column = "Cell_Type",
  num.sim.spots = 50,
  train.freq.cells = 2/3,
  train.freq.spots = 2/3,
  verbose = TRUE
) 
SDDLS <- simMixedProfiles(SDDLS)
# training of SDDLS model
SDDLS <- trainDeconvModel(
  object = SDDLS,
  batch.size = 15,
  num.epochs = 5
)
# simulating spatial data
ngenes <- sample(3:40, size = 1)
ncells <- sample(10:40, size = 1)
counts <- matrix(
  rpois(ngenes * ncells, lambda = 5), ncol = ncells,
  dimnames = list(paste0("Gene", seq(ngenes)), paste0("Spot", seq(ncells)))
)
coordinates <- matrix(
  rep(c(1, 2), ncells), ncol = 2
)
st <- SpatialExperiment::SpatialExperiment(
  assays = list(counts = as.matrix(counts)),
  rowData = data.frame(Gene_ID = paste0("Gene", seq(ngenes))),
  colData = data.frame(Cell_ID = paste0("Spot", seq(ncells))),
  spatialCoords = coordinates
)
SDDLS <- loadSTProfiles(
  object = SDDLS,
  st.data = st,
  st.spot.ID.column = "Cell_ID",
  st.gene.ID.column = "Gene_ID"
)
# simplify arguments
simplify <- list(CellGroup1 = c("CellType1", "CellType2", "CellType4"),
                 CellGroup2 = c("CellType3", "CellType5"))
SDDLS <- deconvSpatialDDLS(
  object = SDDLS,
  index.st = 1,
  simplify.set = simplify, 
  simplify.majority = simplify
)

[Package SpatialDDLS version 1.0.2 Index]