read.vcf {bedr}R Documentation

Read a vcf into R

Description

Read a vcf into R and parse it for downstream analysis

Usage

read.vcf(x, split.info = FALSE, split.samples = FALSE, nrows = -1, verbose = TRUE)

Arguments

x

A vcf

split.info

Split the info into columns

split.samples

Split the sample into columns. If multiple samples then a list matrices will be created, one matrix for each element in the FORMAT tag.

nrows

The the number of rows to be read. Set to 0 to parse the header.

verbose

print progress

Details

The function can be slow for splitting the INFO, FORMAT for large VCFs.

Value

VCF representation in R as a list. The first element in the list is the header, the second the body of the VCF. Every repeating tag in the header i.e. INFO, FORMAT is structured as matrix. If reading a multi-sample VCF and the split.sample = TRUE, then a matrix is added to the list for every tag in the FORMAT string.

Note that by default the vcf is returned as a data.table not a data.frame. Therefore there are some quirks i.e. subsetting via named columns a$vcf[,"CHROM", with = FALSE]. If in doubt just caset to data.frame.

Author(s)

Daryl Waggott

Examples


clinVar.vcf.example      <- system.file("extdata/clinvar_dbSNP138_example.vcf.gz", package = "bedr")
singleSample.vcf.example <- system.file("extdata/singleSampleOICR_example.vcf.gz", package = "bedr")
multiSample.vcf.example  <- system.file("extdata/multiSampleOICR_example.vcf.gz", package = "bedr")

# read a subset of NCBI clinVar vcf.  Note this has no samples.
vcf1.a <- read.vcf(clinVar.vcf.example)
vcf1.b <- read.vcf(clinVar.vcf.example, split.info = TRUE)

## Not run: 

# same as above but split multiple samples
vcf1.c <- read.vcf(clinVar.vcf.example, split.info = TRUE, split.sample = TRUE) 

# read a single-sample VCF
system.time(
  vcf2.a <- read.vcf(singleSample.vcf.example, split.info = TRUE, split.sample = TRUE)
  )

# read a multi-sample VCF
vcf3.a <- read.vcf(multiSample.vcf.example, split.info = FALSE, split.sample = TRUE);

# multi core example
options("cores"=9);
vcf2.a <- read.vcf(singleSample.vcf.example, split.info = TRUE, split.sample = TRUE)
options("cores"=1);


## End(Not run)

[Package bedr version 1.0.7 Index]