bulk_generator {SCdeconR} | R Documentation |
Generate artificial bulk RNA-seq samples based on simulation
Description
Generate artificial bulk RNA-seq samples with random or pre-defined cell-type proportions for benchmarking deconvolution algorithms
Usage
bulk_generator(
ref,
phenodata,
num_mixtures = 500,
num_mixtures_sprop = 10,
pool_size = 100,
seed = 1234,
prop = NULL,
replace = FALSE
)
Arguments
ref |
a matrix-like object of gene expression values with rows representing genes, columns representing cells. |
phenodata |
a data.frame with rows representing cells, columns representing cell attributes. It should at least contain the first two columns as:
|
num_mixtures |
total number of simulated bulk samples. Have to be multiple of |
num_mixtures_sprop |
number of simulated bulk samples with the same simulated cell type proportions. Only applicable when |
pool_size |
number of cells to use to construct each artificial bulk sample. Default to 100. |
seed |
seed to use for simulation. Default to 1234. |
prop |
a data.frame with two columns. The first column includes unique cell types in phenodata; the second column includes cell type proportions. If specified, bulk samples will be simulated based on the specified cell proportions. |
replace |
logical value indicating whether to sample cells with replacement. Default to FALSE, to sample cells without replacement. |
Details
If prop
is not specified, cell type proportions will be firstly randomly generated with at least two cell types present. Then, for each cell proportion
vector, num_mixtures_sprop
number of samples is simulated. Eventually, a total of num_mixtures
number of samples is simulated. If prop is
specified, then a total of num_mixtures
number of samples will be simulated based on the same cell proportion vector specified.
Value
a list of two objects:
simulated bulk RNA-seq data, with rows representing genes, columns representing samples
cell type proportions used to simulate the bulk RNA-seq data, with rows representing cell types, columns representing samples
Examples
ref_list <- c(paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample1"),
paste0(system.file("extdata", package = "SCdeconR"), "/refdata/sample2"))
phenopath1 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample1.txt")
phenopath2 <- paste0(system.file("extdata", package = "SCdeconR"),
"/refdata/phenodata_sample2.txt")
phenodata_list <- c(phenopath1,phenopath2)
# construct integrated reference using harmony algorithm
refdata <- construct_ref(ref_list = ref_list,
phenodata_list = phenodata_list,
data_type = "cellranger",
method = "harmony",
group_var = "subjectid",
nfeature_rna = 50,
vars_to_regress = "percent_mt", verbose = FALSE)
phenodata <- data.frame(cellid = colnames(refdata),
celltypes = refdata$celltype,
subjectid = refdata$subjectid)
prop <- data.frame(celltypes = unique(refdata$celltype),
proportion = rep(1/length(unique(refdata$celltype)), length(unique(refdata$celltype))))
bulk_sim <- bulk_generator(ref = GetAssayData(refdata, slot = "data", assay = "SCT"),
phenodata = phenodata,
num_mixtures = 20,
prop = prop,
num_mixtures_sprop = 1)