VCFsToZipFile {ICAMS}R Documentation

Create a zip file which contains catalogs and plot PDFs from VCFs

Description

Create 3 SBS catalogs (96, 192, 1536), 3 DBS catalogs (78, 136, 144) and Indel catalog from the VCFs specified by dir, save the catalogs as CSV files, plot them to PDF and generate a zip archive of all the output files.

Usage

VCFsToZipFile(
  dir,
  files,
  zipfile,
  ref.genome,
  variant.caller = "unknown",
  num.of.cores = 1,
  trans.ranges = NULL,
  region = "unknown",
  names.of.VCFs = NULL,
  tumor.col.names = NA,
  filter.status = NULL,
  get.vaf.function = NULL,
  ...,
  max.vaf.diff = 0.02,
  base.filename = "",
  return.annotated.vcfs = FALSE,
  suppress.discarded.variants.warnings = TRUE
)

Arguments

dir

Pathname of the directory which contains VCFs that come from the same variant caller. Each VCF must have a file extension ".vcf" (case insensitive) and share the same ref.genome and region.

files

Character vector of file paths to the VCF files. Only one of argument dir or files need to be specified.

zipfile

Pathname of the zip file to be created.

ref.genome

A ref.genome argument as described in ICAMS.

variant.caller

Name of the variant caller that produces the VCF, can be either "strelka", "mutect", "freebayes" or "unknown". This information is needed to calculate the VAFs (variant allele frequencies). If variant caller is "unknown"(default) and get.vaf.function is NULL, then VAF and read depth will be NAs. If variant caller is "mutect", do not merge SBSs into DBS.

num.of.cores

The number of cores to use. Not available on Windows unless num.of.cores = 1.

trans.ranges

Optional. If ref.genome specifies one of the BSgenome object

  1. BSgenome.Hsapiens.1000genomes.hs37d5

  2. BSgenome.Hsapiens.UCSC.hg38

  3. BSgenome.Mmusculus.UCSC.mm10

then the function will infer trans.ranges automatically. Otherwise, user will need to provide the necessary trans.ranges. Please refer to TranscriptRanges for more details. If is.null(trans.ranges) do not add transcript range information.

region

A character string designating a genomic region; see as.catalog and ICAMS.

names.of.VCFs

Optional. Character vector of names of the VCF files. The order of names in names.of.VCFs should match the order of VCFs listed in dir. If NULL(default), this function will remove all of the path up to and including the last path separator (if any) in dir and file paths without extensions (and the leading dot) will be used as the names of the VCF files.

tumor.col.names

Optional. Only applicable to Mutect VCFs. Character vector of column names in Mutect VCFs which contain the tumor sample information. The order of names in tumor.col.names should match the order of Mutect VCFs specified in files. If tumor.col.names is equal to NA(default), this function will use the 10th column in all the Mutect VCFs to calculate VAFs. See GetMutectVAF for more details.

filter.status

The status indicating a variant has passed all filters. An example would be "PASS". Variants which don't have the specified filter.status in the FILTER column in VCF will be removed. If NULL(default), no variants will be removed from the original VCF.

get.vaf.function

Optional. Only applicable when variant.caller is "unknown". Function to calculate VAF(variant allele frequency) and read depth information from original VCF. See GetMutectVAF as an example. If NULL(default) and variant.caller is "unknown", then VAF and read depth will be NAs.

...

Optional arguments to get.vaf.function.

max.vaf.diff

Not applicable if variant.caller = "mutect". The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases.

base.filename

Optional. The base name of the CSV and PDF files to be produced; multiple files will be generated, each ending in x.csv or x.pdf, where x indicates the type of catalog.

return.annotated.vcfs

Logical. Whether to return the annotated VCFs with additional columns showing mutation class for each variant. Default is FALSE.

suppress.discarded.variants.warnings

Logical. Whether to suppress warning messages showing information about the discarded variants. Default is TRUE.

Details

This function calls VCFsToCatalogs, PlotCatalogToPdf, WriteCatalog and zip::zipr.

Value

A list containing the following objects:

If trans.ranges is not provided by user and cannot be inferred by ICAMS, SBS 192 and DBS 144 catalog will not be generated. Each catalog has attributes added. See as.catalog for more details.

ID classification

See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.

See the documentation for Canonicalize1Del which first handles deletions in homopolymers, then handles deletions in simple repeats with longer repeat units, (e.g. CACACACA, see FindMaxRepeatDel), and if the deletion is not in a simple repeat, looks for microhomology (see FindDelMH).

See the code for unexported function CanonicalizeID and the functions it calls for handling of insertions.

Note

SBS 192 and DBS 144 catalogs include only mutations in transcribed regions. In ID (small insertion and deletion) catalogs, deletion repeat sizes range from 0 to 5+, but for plotting and end-user documentation deletion repeat sizes range from 1 to 6+.

Comments

To add or change attributes of the catalog, you can use function attr.
For example, attr(catalog, "abundance") <- custom.abundance.

Examples

dir <- c(system.file("extdata/Mutect-vcf",
                     package = "ICAMS"))
if (requireNamespace("BSgenome.Hsapiens.1000genomes.hs37d5", quietly = TRUE)) {
  catalogs <-
    VCFsToZipFile(dir,
                  zipfile = file.path(tempdir(), "test.zip"),
                  ref.genome = "hg19",
                  variant.caller = "mutect",
                  region = "genome",
                  base.filename = "Mutect")
  unlink(file.path(tempdir(), "test.zip"))}

[Package ICAMS version 2.3.12 Index]