| VCFsToIDCatalogs {ICAMS} | R Documentation |
Create ID (small insertion and deletion) catalog from ID VCFs
Description
Create ID (small insertion and deletion) catalog from ID VCFs
Usage
VCFsToIDCatalogs(
list.of.vcfs,
ref.genome,
num.of.cores = 1,
region = "unknown",
flag.mismatches = 0,
return.annotated.vcfs = FALSE,
suppress.discarded.variants.warnings = TRUE
)
Arguments
list.of.vcfs |
List of in-memory ID VCFs. The list names will be the sample ids in the output catalog. |
ref.genome |
A |
num.of.cores |
The number of cores to use. Not available on Windows
unless |
region |
A character string acting as a region identifier, one of "genome", "exome". |
flag.mismatches |
Deprecated. If there are ID variants whose |
return.annotated.vcfs |
Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE. |
suppress.discarded.variants.warnings |
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE. |
Value
A list of elements:
-
catalog: The ID (small insertion and deletion) catalog with attributes added. Seeas.catalogfor details. -
discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra columndiscarded.reasonfor more details. -
annotated.vcfs: Non-NULL only ifreturn.annotated.vcfs= TRUE. A list of data frames which contain the original VCF's ID mutation rows with three additional columnsseq.context.width,seq.contextandID.classadded. The category assignment of each ID mutation in VCF can be obtained fromID.classcolumn.
Note
In ID (small insertion and deletion) catalogs, deletion repeat sizes range from 0 to 5+, but for plotting and end-user documentation deletion repeat sizes range from 1 to 6+.
ID classification
See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.
See the documentation for Canonicalize1Del which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. CACACACA, see
FindMaxRepeatDel), and if the deletion is not in a simple
repeat, looks for microhomology (see FindDelMH).
See the code for unexported function CanonicalizeID
and the functions it calls for handling of insertions.
Examples
file <- c(system.file("extdata/Strelka-ID-vcf/",
"Strelka.ID.GRCh37.s1.vcf",
package = "ICAMS"))
list.of.ID.vcfs <- ReadStrelkaIDVCFs(file)
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5",
quietly = TRUE)) {
catID <- VCFsToIDCatalogs(list.of.ID.vcfs, ref.genome = "hg19",
region = "genome")}