Simulating Clones {CloneSeeker}R Documentation

Simulating Tumor Clones

Description

Simulating copy number segmentation data and sequencing mutation data for tumors composed of multiple clones.

Usage

generateTumorData(tumor, snps.seq, snps.cgh, mu, sigma.reads,
                  sigma0.lrr, sigma0.baf, density.sigma)
plotTumorData(tumor, data)
tumorGen(...)
dataGen(tumor, ...)

Arguments

tumor

an object of the Tumor class.

snps.seq

an integer; the total number of germline variants and somatic mutations to simulate in the tumor genome.

snps.cgh

an integer; the number of single nucleotide polymorphisms (SNPs) to simulate as meaurements made to estimate copy number.

mu

an integer; the average read depth of a simulated sequencing study giving rise to mutations.

sigma.reads

a real number; the standard deviation of the number of simulated sequencing reads per base.

sigma0.lrr

a real number; the standard deviation of the simulated per-SNP log R ratio (LRR) for assessing copy number.

sigma0.baf

a real number; the standard deviation of the simulated B allele frequency (BAF) for assessing copy number.

density.sigma

a real number; the standard deviation of a beta distribution used to simulate the number of SNP markers per copy number segment.

data

a list containing two data frames, cn.data and seq.data, as produced by generateTumorData.

...

additional variables

Details

Copy number and mutation data are simulated essentially independently. Each simulation starts with a single "normal" genome, and CNVs and/or mutations are randomly generated for each new "branch" or subclone. (The number of subclones depends on the input parameters.) Each successive branch is randomly determined to descend from one of the existing clones, and therefore contains both the aberrations belonging to its parent clone and the novel aberrations assigned to it. Depending on input parameters, the algorithm can also randomly select some clones for extinction in the process of generating the heterogeneous tumor, to yield a more realistic population structure.

Note that tumorGen (an alias for Tumor that returns a list instead of a Tumor object) and dataGen (an alias for generateTumorData) are DEPRECATED.

Value

The generateTumorData function returns a list with two components, cn.data and seq.data. Each component is itself a data frame. Note that in some cases, one of these data frames may have zero rows or may be returned as an NA.

The cn.data component contains seven columns:

chr

the chromosome number;

seq

a unique segment identifier;

LRR

simulated segment-wise log ratios;

BAF

simulated segment-wise B allele frequencies;

X and Y

simulated intensities for two separate alleles/haplotypes per segment; and

markers

the simulated number of SNPS per segment.

The seq.data component contains eight columns:

chr

the chromosome number;

seq

a unique "segment" identifier;

mut.id

a unique mutation identifier;

refCounts and varCounts

the simulated numbers of reference and variant counts per mutation;

VAF

the simulated variant allele frequency;

totalCounts

the simulated total number of read counts; and

status

a character (that should probably be a factor) indicating whether a variant should be viewed as somatic or germline.

The plotTumorData function invisibly returns its data argument.

Author(s)

Kevin R. Coombes krc@silicovore.com, Mark Zucker zucker.64@buckeyemail.osu.edu

Examples

psis <- c(0.6, 0.3, 0.1) # three clones
# create tumor with copy number but no mutation data
tumor <- Tumor(psis, rounds = 400, nu = 0, pcnv = 1, norm.contam = FALSE)
# simulate the dataset
dataset <- generateTumorData(tumor, 10000, 600000, 70, 25, 0.15, 0.03, 0.1)
#plot it
plotTumorData(tumor, dataset)

[Package CloneSeeker version 1.0.11 Index]