Simulating Clones {CloneSeeker} | R Documentation |
Simulating Tumor Clones
Description
Simulating copy number segmentation data and sequencing mutation data for tumors composed of multiple clones.
Usage
generateTumorData(tumor, snps.seq, snps.cgh, mu, sigma.reads,
sigma0.lrr, sigma0.baf, density.sigma)
plotTumorData(tumor, data)
tumorGen(...)
dataGen(tumor, ...)
Arguments
tumor |
an object of the |
snps.seq |
an integer; the total number of germline variants and somatic mutations to simulate in the tumor genome. |
snps.cgh |
an integer; the number of single nucleotide polymorphisms (SNPs) to simulate as meaurements made to estimate copy number. |
mu |
an integer; the average read depth of a simulated sequencing study giving rise to mutations. |
sigma.reads |
a real number; the standard deviation of the number of simulated sequencing reads per base. |
sigma0.lrr |
a real number; the standard deviation of the simulated per-SNP log R ratio (LRR) for assessing copy number. |
sigma0.baf |
a real number; the standard deviation of the simulated B allele frequency (BAF) for assessing copy number. |
density.sigma |
a real number; the standard deviation of a beta distribution used to simulate the number of SNP markers per copy number segment. |
data |
a list containing two data frames, |
... |
additional variables |
Details
Copy number and mutation data are simulated essentially independently. Each simulation starts with a single "normal" genome, and CNVs and/or mutations are randomly generated for each new "branch" or subclone. (The number of subclones depends on the input parameters.) Each successive branch is randomly determined to descend from one of the existing clones, and therefore contains both the aberrations belonging to its parent clone and the novel aberrations assigned to it. Depending on input parameters, the algorithm can also randomly select some clones for extinction in the process of generating the heterogeneous tumor, to yield a more realistic population structure.
Note that tumorGen
(an alias for Tumor
that returns a
list instead of a Tumor object) and dataGen
(an alias for
generateTumorData
) are DEPRECATED.
Value
The generateTumorData
function returns a list with two
components, cn.data
and seq.data
. Each component is
itself a data frame. Note that in some cases, one of these data frames
may have zero rows or may be returned as an NA
.
The cn.data
component contains seven columns:
chr
the chromosome number;
seq
a unique segment identifier;
LRR
simulated segment-wise log ratios;
BAF
simulated segment-wise B allele frequencies;
X
andY
simulated intensities for two separate alleles/haplotypes per segment; and
markers
the simulated number of SNPS per segment.
The seq.data
component contains eight columns:
chr
the chromosome number;
seq
a unique "segment" identifier;
mut.id
a unique mutation identifier;
refCounts
andvarCounts
the simulated numbers of reference and variant counts per mutation;
VAF
the simulated variant allele frequency;
totalCounts
the simulated total number of read counts; and
status
a character (that should probably be a factor) indicating whether a variant should be viewed as somatic or germline.
The plotTumorData
function invisibly returns its data
argument.
Author(s)
Kevin R. Coombes krc@silicovore.com, Mark Zucker zucker.64@buckeyemail.osu.edu
Examples
psis <- c(0.6, 0.3, 0.1) # three clones
# create tumor with copy number but no mutation data
tumor <- Tumor(psis, rounds = 400, nu = 0, pcnv = 1, norm.contam = FALSE)
# simulate the dataset
dataset <- generateTumorData(tumor, 10000, 600000, 70, 25, 0.15, 0.03, 0.1)
#plot it
plotTumorData(tumor, dataset)