GeneScape {GeneScape} | R Documentation |
GeneScape
Description
This function simulate single cell RNAseq data with complicated differential expression and correlation structure.
Usage
GeneScape(
nCells = 6000,
nGroups = NULL,
groups = NULL,
lib.size.loc = 9.3,
lib.size.scale = 0.25,
de.fc.mat = NULL,
nGenes = 5000,
gene.mean.shape = 0.3,
gene.mean.rate = 0.15,
gene.means = NULL,
de.n = 50,
de.share = NULL,
de.id = NULL,
de.fc.loc = 0.7,
de.fc.scale = 0.2,
add.sub = FALSE,
sub.major = NULL,
sub.prop = 0.1,
sub.group = NULL,
sub.de.n = 20,
sub.de.id = NULL,
sub.de.common = FALSE,
sub.de.fc.loc = 0.7,
sub.de.fc.scale = 0.2,
add.cor = FALSE,
cor.n = 4,
cor.size = 20,
cor.cor = 0.7,
cor.id = NULL,
band.width = 10,
add.hub = FALSE,
hub.n = 10,
hub.size = 20,
hub.cor = 0.4,
hub.id = NULL,
hub.fix = NULL,
drop = FALSE,
dropout.location = -2,
dropout.slope = -1
)
Arguments
nCells |
number of cells |
nGroups |
number of cell groups |
groups |
group information for cells |
lib.size.loc |
location parameter for library size (log-normal distribution) |
lib.size.scale |
scale parameter for library size (log-normal distribution) |
de.fc.mat |
differential expression fold change matrix, could be generated by this function |
nGenes |
number of genes |
gene.mean.shape |
shape parameter for mean expression level (Gamma distribution) |
gene.mean.rate |
rate parameter for mean expression level (Gamma distribution) |
gene.means |
mean gene expression levels |
de.n |
number of differentially expressed genes in each cell type. Should be a integer or a vector of length nGroups |
de.share |
number of shared DE genes between neighbor cell types. Should be a vector of length (nGroups - 1) |
de.id |
the index of genes that are DE across cell types. Should be a list of vectors. Each vector corresponds to a cell type. With non-null value of de.id, de.n and de.share would be ignored. |
de.fc.loc |
the location parameter for the fold change of DE genes. Should be a number, a vector of length nGroups |
de.fc.scale |
the scale parameter for fold change (log-normal distribution). Should be a number or a vector of length nGroups |
add.sub |
whether to add sub-cell-types |
sub.major |
the major cell types correspond to the sub-cell-types |
sub.prop |
proportion of sub-cell-types in the corresponding major cell type |
sub.group |
cell index for sub-cell-types. With non-null sub.group specified, sub.prop would be ignored. |
sub.de.n |
number of differentially expressed genes in each sub-cell-type compared to the corresponding major cell type. Should be a integer or a vector of length sub.major |
sub.de.id |
the index of additional differentially expressed genes between sub-cell-types and the corresponding major cell types |
sub.de.common |
whether the additional differential expression structure should be same for all sub-cell-types |
sub.de.fc.loc |
similar to de.fc.loc, but for addtional differentially expressed genes in sub-cell-types |
sub.de.fc.scale |
similar to de.fc.scale, but for addtional differentially expressed genes in sub-cell-types |
add.cor |
whether to add pathways (correlated genes) |
cor.n |
number of pathways included. Should be a integer. |
cor.size |
number of correlated genes (length of pathway). Should be a number or a vector of length cor.n |
cor.cor |
correlation parameters |
cor.id |
gene index of correlated (pathway) genes. Should be a list of vectors, with each vector represents a pathway. With non-null value of cor.id, cor.n would be ignored. |
band.width |
No correlation exists if distance of 2 genes are further than band_width in a pathway |
add.hub |
whether to add hub genes |
hub.n |
number of hub genes included. Should be a integer. |
hub.size |
number of genes correlated to the hub gene. Should be a number or a vector of length hub.n |
hub.cor |
correlation parameters between hub genes and their correlated genes |
hub.id |
gene index of hub genes. Should be a list of vectors. With non-null value of hub.id, hub.n would be ignored. |
hub.fix |
user defined genes correlated to hub genes (others are randomly selected). Should be a list of vectors of length hub.n or same as hub.id. |
drop |
whether to add dropout |
dropout.location |
dropout mid point (the mean expression level at which the probability is equal to 0.5, same as splat. Could be negative) |
dropout.slope |
how dropout proportion changes with increasing expression |
Details
Compared to splat method in Splatter R package, this function can fix the number and position of differentially expressed genes, have more complicated differential expression structure, add sub-cell-types, correlated genes (AR(1) correlation structure with bound, mimicking pathways) and hub genes.
Value
A list of observed data, true data (without dropout), differential expression rate and hub gene indices.
References
Zappia, L., Phipson, B., & Oshlack, A. (2017). Splatter: Simulation of single-cell RNA sequencing data. Genome Biology, 18(1). https://doi.org/10.1186/s13059-017-1305-0
Examples
set.seed(1)
data <- GeneScape()