simSCProfiles {SpatialDDLS} | R Documentation |
Simulate new single-cell RNA-Seq expression profiles using the ZINB-WaVE model parameters
Description
Simulate single-cell expression profiles by randomly sampling from a negative
binomial distribution and inserting dropouts by sampling from a binomial
distribution using the ZINB-WaVE parameters estimated by the
estimateZinbwaveParams
function.
Usage
simSCProfiles(
object,
cell.ID.column,
cell.type.column,
n.cells,
suffix.names = "_Simul",
cell.types = NULL,
file.backend = NULL,
name.dataset.backend = NULL,
compression.level = NULL,
block.processing = FALSE,
block.size = 1000,
chunk.dims = NULL,
verbose = TRUE
)
Arguments
object |
|
cell.ID.column |
Name or column number corresponding to the cell names of expression matrix in cells metadata. |
cell.type.column |
Name or column number corresponding to the cell type of each cell in cells metadata. |
n.cells |
Number of simulated cells generated per cell type (i.e. if you
have 10 different cell types in your dataset, if |
suffix.names |
Suffix used on simulated cells. This suffix must be unique in the simulated cells, so make sure that this suffix does not appear in the real cell names. |
cell.types |
Vector indicating the cell types to simulate. If
|
file.backend |
Valid file path to store the simulated single-cell
expression profiles as an HDF5 file ( |
name.dataset.backend |
Name of the dataset in HDF5 file to be used. Note
that it cannot exist. If |
compression.level |
The compression level used if |
block.processing |
Boolean indicating whether the data should be
simulated in blocks (only if |
block.size |
Only if |
chunk.dims |
Specifies the dimensions that HDF5 chunk will have. If
|
verbose |
Show informative messages during the execution ( |
Details
Before this step, see ?estimateZinbwaveParams
. As described in
Torroja and Sanchez-Cabo, 2019, this function simulates a given number of
transcriptional profiles for each cell type provided by randomly sampling
from a negative binomial distribution with \mu
and \theta
estimated parameters and inserting dropouts by sampling from a binomial
distribution with probability pi. All parameters are estimated from
single-cell real data using the estimateZinbwaveParams
function. It uses the ZINB-WaVE model (Risso et al., 2018). For more details
about the model, see ?estimateZinbwaveParams
and Risso et al.,
2018.
The file.backend
argument allows to create a HDF5 file with simulated
single-cell profiles to be used as back-end to work with data stored on disk
instead of loaded into RAM. If the file.backend
argument is used with
block.processing = FALSE
, all the single-cell profiles will be
simulated in one step and, therefore, loaded into in RAM memory. Then, data
will be written in HDF5 file. To avoid to collapse RAM memory if too many
single-cell profiles are goin to be simulated, single-cell profiles can be
simulated and written to HDF5 files in blocks of block.size
size by
setting block.processing = TRUE
.
Value
A SpatialDDLS
object with
single.cell.simul
slot containing a
SingleCellExperiment
object with the simulated
single-cell expression profiles.
References
Risso, D., Perraudeau, F., Gribkova, S. et al. (2018). A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun 9, 284. doi: doi:10.1038/s41467-017-02554-5.
Torroja, C. and Sánchez-Cabo, F. (2019). digitalDLSorter: A Deep Learning algorithm to quantify immune cell populations based on scRNA-Seq data. Frontiers in Genetics 10, 978. doi: doi:10.3389/fgene.2019.00978.
See Also
Examples
set.seed(123) # reproducibility
sce <- SingleCellExperiment::SingleCellExperiment(
assays = list(
counts = matrix(
rpois(30, lambda = 5), nrow = 15, ncol = 10,
dimnames = list(paste0("Gene", seq(15)), paste0("RHC", seq(10)))
)
),
colData = data.frame(
Cell_ID = paste0("RHC", seq(10)),
Cell_Type = sample(x = paste0("CellType", seq(2)), size = 10,
replace = TRUE)
),
rowData = data.frame(
Gene_ID = paste0("Gene", seq(15))
)
)
SDDLS <- createSpatialDDLSobject(
sc.data = sce,
sc.cell.ID.column = "Cell_ID",
sc.gene.ID.column = "Gene_ID",
sc.filt.genes.cluster = FALSE,
project = "Simul_example"
)
SDDLS <- estimateZinbwaveParams(
object = SDDLS,
cell.type.column = "Cell_Type",
cell.ID.column = "Cell_ID",
gene.ID.column = "Gene_ID",
subset.cells = 2,
verbose = TRUE
)
SDDLS <- simSCProfiles(
object = SDDLS,
cell.ID.column = "Cell_ID",
cell.type.column = "Cell_Type",
n.cells = 2,
verbose = TRUE
)