VCFsToZipFile {ICAMS} | R Documentation |
Create a zip file which contains catalogs and plot PDFs from VCFs
Description
Create 3 SBS catalogs (96, 192, 1536), 3 DBS catalogs (78, 136, 144) and
Indel catalog from the VCFs specified by dir
, save the catalogs
as CSV files, plot them to PDF and generate a zip archive of all the output files.
Usage
VCFsToZipFile(
dir,
files,
zipfile,
ref.genome,
variant.caller = "unknown",
num.of.cores = 1,
trans.ranges = NULL,
region = "unknown",
names.of.VCFs = NULL,
tumor.col.names = NA,
filter.status = NULL,
get.vaf.function = NULL,
...,
max.vaf.diff = 0.02,
base.filename = "",
return.annotated.vcfs = FALSE,
suppress.discarded.variants.warnings = TRUE
)
Arguments
dir |
Pathname of the directory which contains VCFs that come from the
same variant caller. Each VCF must have a file extension
".vcf" (case insensitive) and share the same |
files |
Character vector of file paths to the VCF files. Only one of
argument |
zipfile |
Pathname of the zip file to be created. |
ref.genome |
A |
variant.caller |
Name of the variant caller that produces the VCF, can
be either |
num.of.cores |
The number of cores to use. Not available on Windows
unless |
trans.ranges |
Optional. If
then the function will infer |
region |
A character string designating a genomic region;
see |
names.of.VCFs |
Optional. Character vector of names of the VCF files.
The order of names in |
tumor.col.names |
Optional. Only applicable to Mutect VCFs.
Character vector of column names in Mutect VCFs which contain the
tumor sample information. The order of names in |
filter.status |
The status indicating a variant has passed all filters.
An example would be |
get.vaf.function |
Optional. Only applicable when |
... |
Optional arguments to |
max.vaf.diff |
Not applicable if |
base.filename |
Optional. The base name of the CSV and PDF files to be
produced; multiple files will be generated, each ending in
|
return.annotated.vcfs |
Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE. |
suppress.discarded.variants.warnings |
Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE. |
Details
This function calls VCFsToCatalogs
,
PlotCatalogToPdf
, WriteCatalog
and
zip::zipr
.
Value
A list containing the following objects:
-
catSBS96
,catSBS192
,catSBS1536
: Matrix of 3 SBS catalogs (one each for 96, 192, and 1536). -
catDBS78
,catDBS136
,catDBS144
: Matrix of 3 DBS catalogs (one each for 78, 136, and 144). -
catID
: Matrix of ID (small insertion and deletion) catalog. -
discarded.variants
: Non-NULL only if there are variants that were excluded from the analysis. See the added extra columndiscarded.reason
for more details. -
annotated.vcfs
: Non-NULL only ifreturn.annotated.vcfs
= TRUE. A list of elements:-
SBS
: SBS VCF annotated byAnnotateSBSVCF
with three new columnsSBS96.class
,SBS192.class
andSBS1536.class
showing the mutation class for each SBS variant. -
DBS
: DBS VCF annotated byAnnotateDBSVCF
with three new columnsDBS78.class
,DBS136.class
andDBS144.class
showing the mutation class for each DBS variant. -
ID
: ID VCF annotated byAnnotateIDVCF
with one new columnID.class
showing the mutation class for each ID variant.
-
If trans.ranges
is not provided by user and cannot be inferred by
ICAMS, SBS 192 and DBS 144 catalog will not be generated. Each catalog has
attributes added. See as.catalog
for more details.
ID classification
See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.
See the documentation for Canonicalize1Del
which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. CACACACA
, see
FindMaxRepeatDel
), and if the deletion is not in a simple
repeat, looks for microhomology (see FindDelMH
).
See the code for unexported function CanonicalizeID
and the functions it calls for handling of insertions.
Note
SBS 192 and DBS 144 catalogs include only mutations in transcribed regions. In ID (small insertion and deletion) catalogs, deletion repeat sizes range from 0 to 5+, but for plotting and end-user documentation deletion repeat sizes range from 1 to 6+.
Comments
To add or change attributes of the catalog, you can use function
attr
.
For example, attr(catalog, "abundance")
<- custom.abundance
.
Examples
dir <- c(system.file("extdata/Mutect-vcf",
package = "ICAMS"))
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
catalogs <-
VCFsToZipFile(dir,
zipfile = file.path(tempdir(), "test.zip"),
ref.genome = "hg19",
variant.caller = "mutect",
region = "genome",
base.filename = "Mutect")
unlink(file.path(tempdir(), "test.zip"))}