StrelkaIDVCFFilesToZipFile {ICAMS} | R Documentation |
Create a zip file which contains ID (small insertion and deletion) catalog and plot PDF from Strelka ID VCF files
Description
Create ID (small insertion and deletion) catalog from the Strelka ID VCFs
specified by dir
, save the catalog as CSV file, plot it to PDF and
generate a zip archive of all the output files.
Usage
StrelkaIDVCFFilesToZipFile(
dir,
zipfile,
ref.genome,
region = "unknown",
names.of.VCFs = NULL,
base.filename = "",
flag.mismatches = 0,
return.annotated.vcfs = FALSE,
suppress.discarded.variants.warnings = TRUE
)
Arguments
dir |
Pathname of the directory which contains only the Strelka
ID VCF files. Each Strelka ID VCF must have a file extension
".vcf" (case insensitive) and share the same |
zipfile |
Pathname of the zip file to be created. |
ref.genome |
A |
region |
A character string designating a genomic region;
see |
names.of.VCFs |
Optional. Character vector of names of the VCF files.
The order of names in |
base.filename |
Optional. The base name of the CSV and PDF file to be
produced; the file is ending in |
flag.mismatches |
Deprecated. If there are ID variants whose |
return.annotated.vcfs |
Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE. |
suppress.discarded.variants.warnings |
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE. |
Details
This function calls StrelkaIDVCFFilesToCatalog
,
PlotCatalogToPdf
, WriteCatalog
and
zip::zipr
.
Value
A list of elements:
-
catalog
: The ID (small insertion and deletion) catalog with attributes added. Seeas.catalog
for more details. -
discarded.variants
: Non-NULL only if there are variants that were excluded from the analysis. See the added extra columndiscarded.reason
for more details. -
annotated.vcfs
: Non-NULL only ifreturn.annotated.vcfs
= TRUE. A list of data frames which contain the original VCF's ID mutation rows with three additional columnsseq.context.width
,seq.context
andID.class
added. The category assignment of each ID mutation in VCF can be obtained fromID.class
column.
ID classification
See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.
See the documentation for Canonicalize1Del
which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. CACACACA
, see
FindMaxRepeatDel
), and if the deletion is not in a simple
repeat, looks for microhomology (see FindDelMH
).
See the code for unexported function CanonicalizeID
and the functions it calls for handling of insertions.
Note
In ID (small insertion and deletion) catalogs, deletion repeat sizes range from 0 to 5+, but for plotting and end-user documentation deletion repeat sizes range from 1 to 6+.
Examples
dir <- c(system.file("extdata/Strelka-ID-vcf",
package = "ICAMS"))
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
catalogs <-
StrelkaIDVCFFilesToZipFile(dir,
zipfile = file.path(tempdir(), "test.zip"),
ref.genome = "hg19",
region = "genome",
base.filename = "Strelka-ID")
unlink(file.path(tempdir(), "test.zip"))}