TransformCatalog {ICAMS} | R Documentation |
Transform between counts and density spectrum catalogs and counts and density signature catalogs
Description
Transform between counts and density spectrum catalogs and counts and density signature catalogs
Usage
TransformCatalog(
catalog,
target.ref.genome = NULL,
target.region = NULL,
target.catalog.type = NULL,
target.abundance = NULL
)
Arguments
catalog |
An SBS or DBS catalog as described in |
target.ref.genome |
A |
target.region |
A |
target.catalog.type |
A character string acting as a catalog type
identifier, one of "counts", "density", "counts.signature",
"density.signature"; see |
target.abundance |
A vector of counts, one for each source K-mer for mutations (e.g. for
strand-agnostic single nucleotide substitutions in trinucleotide – i.e.
3-mer – context, one count each for ACA, ACC, ACG, ... TTT). See
|
Details
Only the following transformations are legal:
-
counts -> counts
(deprecated, generates a warning; we strongly suggest that you work with densities if comparing spectra or signatures generated from data with different underlying abundances.) -
counts -> density
-
counts -> (counts.signature, density.signature)
-
density -> counts
(the semantics are to infer the genome-wide or exome-wide counts based on the densities) -
density -> density
(a null operation, generates a warning) -
density -> (counts.signature, density.signature)
-
counts.signature -> counts.signature
(used to transform between the source abundance andtarget.abundance
) -
counts.signature -> density.signature
-
counts.signature -> (counts, density)
(generates an error) -
density.signature -> density.signature
(a null operation, generates a warning) -
density.signature -> counts.signature
-
density.signature -> (counts, density)
(generates an error)
Value
A catalog as defined in ICAMS
.
Rationale
The TransformCatalog
function transforms catalogs of mutational spectra or
signatures to account for differing abundances of the source
sequence of the mutations in the genome.
For example, mutations from ACG are much rarer in the human genome than mutations from ACC simply because CG dinucleotides are rare in the genome. Consequently, there are two possible representations of mutational spectra or signatures. One representation is based on mutation counts as observed in a given genome or exome, and this approach is widely used, as, for example, at https://cancer.sanger.ac.uk/cosmic/signatures, which presents signatures based on observed mutation counts in the human genome. We call these "counts-based spectra" or "counts-based signatures".
Alternatively, mutational spectra or signatures can be represented as mutations per source sequence, for example the number of ACT > AGT mutations occurring at all ACT 3-mers in a genome. We call these "density-based spectra" or "density-based signatures".
This function can also transform spectra based on observed genome-wide counts to "density"-based catalogs. In density-based catalogs mutations are expressed as mutations per source sequences. For example, a density-based catalog represents the proportion of ACCs mutated to ATCs, the proportion of ACGs mutated to ATGs, etc. This is different from counts-based mutational spectra catalogs, which contain the number of ACC > ATC mutations, the number of ACG > ATG mutations, etc.
This function can also transform observed-count based spectra or signatures from genome to exome based counts, or between different species (since the abundances of source sequences vary between genome and exome and between species).
Examples
file <- system.file("extdata",
"strelka.regress.cat.sbs.96.csv",
package = "ICAMS")
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
catSBS96.counts <- ReadCatalog(file, ref.genome = "hg19",
region = "genome",
catalog.type = "counts")
catSBS96.density <- TransformCatalog(catSBS96.counts,
target.ref.genome = "hg19",
target.region = "genome",
target.catalog.type = "density")}