Simulating Clones {CloneSeeker} | R Documentation |

Simulating copy number segmentation data and sequencing mutation data for tumors composed of multiple clones.

```
generateTumorData(tumor, snps.seq, snps.cgh, mu, sigma.reads,
sigma0.lrr, sigma0.baf, density.sigma)
plotTumorData(tumor, data)
tumorGen(...)
dataGen(tumor, ...)
```

`tumor` |
an object of the |

`snps.seq` |
an integer; the total number of germline variants and somatic mutations to simulate in the tumor genome. |

`snps.cgh` |
an integer; the number of single nucleotide polymorphisms (SNPs) to simulate as meaurements made to estimate copy number. |

`mu` |
an integer; the average read depth of a simulated sequencing study giving rise to mutations. |

`sigma.reads` |
a real number; the standard deviation of the number of simulated sequencing reads per base. |

`sigma0.lrr` |
a real number; the standard deviation of the simulated per-SNP log R ratio (LRR) for assessing copy number. |

`sigma0.baf` |
a real number; the standard deviation of the simulated B allele frequency (BAF) for assessing copy number. |

`density.sigma` |
a real number; the standard deviation of a beta distribution used to simulate the number of SNP markers per copy number segment. |

`data` |
a list containing two data frames, |

`...` |
additional variables |

Copy number and mutation data are simulated essentially independently. Each simulation starts with a single "normal" genome, and CNVs and/or mutations are randomly generated for each new "branch" or subclone. (The number of subclones depends on the input parameters.) Each successive branch is randomly determined to descend from one of the existing clones, and therefore contains both the aberrations belonging to its parent clone and the novel aberrations assigned to it. Depending on input parameters, the algorithm can also randomly select some clones for extinction in the process of generating the heterogeneous tumor, to yield a more realistic population structure.

Note that `tumorGen`

(an alias for `Tumor`

that returns a
list instead of a Tumor object) and `dataGen`

(an alias for
`generateTumorData`

) are DEPRECATED.

The `generateTumorData`

function returns a list with two
components, `cn.data`

and `seq.data`

. Each component is
itself a data frame. Note that in some cases, one of these data frames
may have zero rows or may be returned as an `NA`

.

The `cn.data`

component contains seven columns:

`chr`

the chromosome number;

`seq`

a unique segment identifier;

`LRR`

simulated segment-wise log ratios;

`BAF`

simulated segment-wise B allele frequencies;

`X`

and`Y`

simulated intensities for two separate alleles/haplotypes per segment; and

`markers`

the simulated number of SNPS per segment.

The `seq.data`

component contains eight columns:

`chr`

the chromosome number;

`seq`

a unique "segment" identifier;

`mut.id`

a unique mutation identifier;

`refCounts`

and`varCounts`

the simulated numbers of reference and variant counts per mutation;

`VAF`

the simulated variant allele frequency;

`totalCounts`

the simulated total number of read counts; and

`status`

a character (that should probably be a factor) indicating whether a variant should be viewed as somatic or germline.

The `plotTumorData`

function invisibly returns its `data`

argument.

Kevin R. Coombes krc@silicovore.com, Mark Zucker zucker.64@buckeyemail.osu.edu

```
psis <- c(0.6, 0.3, 0.1) # three clones
# create tumor with copy number but no mutation data
tumor <- Tumor(psis, rounds = 400, nu = 0, pcnv = 1, norm.contam = FALSE)
# simulate the dataset
dataset <- generateTumorData(tumor, 10000, 600000, 70, 25, 0.15, 0.03, 0.1)
#plot it
plotTumorData(tumor, dataset)
```

[Package *CloneSeeker* version 1.0.11 Index]