VCF input and output {vcfR} | R Documentation |
Read and write vcf format files
Description
Read and files in the *.vcf structured text format, as well as the compressed *.vcf.gz format. Write objects of class vcfR to *.vcf.gz.
Usage
read.vcfR(
file,
limit = 1e+07,
nrows = -1,
skip = 0,
cols = NULL,
convertNA = TRUE,
checkFile = TRUE,
check_keys = TRUE,
verbose = TRUE
)
write.vcf(x, file = "", mask = FALSE, APPEND = FALSE)
Arguments
file |
A filename for a variant call format (vcf) file. |
limit |
amount of memory (in bytes) not to exceed when reading in a file. |
nrows |
integer specifying the maximum number of rows (variants) to read in. |
skip |
integer specifying the number of rows (variants) to skip before beginning to read data. |
cols |
vector of column numbers to extract from file. |
convertNA |
logical specifying to convert VCF missing data to NA. |
checkFile |
test if the first line follows the VCF specification. |
check_keys |
logical determining if |
verbose |
report verbose progress. |
x |
An object of class vcfR or chromR. |
mask |
logical vector indicating rows to use. |
APPEND |
logical indicating whether to append to existing vcf file or write a new file. |
Details
The function read.vcfR reads in files in *.vcf (text) and *.vcf.gz (gzipped text) format and returns an object of class vcfR. The parameter 'limit' is an attempt to keep the user from trying to read in a file which contains more data than there is memory to hold. Based on the dimensions of the data matrix, an estimate of how much memory needed is made. If this estimate exceeds the value of 'limit' an error is thrown and execution stops. The user may increase this limit to any value, but is encourages to compare that value to the amout of available physical memory.
It is possible to input part of a VCF file by using the parameters nrows, skip and cols. The first eight columns (the fix region) are part of the definition and will always be included. Any columns beyond eight are optional (the gt region). You can specify which of these columns you would like to input by setting the cols parameter. If you want a usable vcfR object you will want to always include nine (the FORMAT column). If you do not include column nine you may experience reduced functionality.
According to the VCF specification missing data are encoded by a period (".").
Within the R language, missing data can be encoded as NA.
The parameter 'convertNA' allows the user to either retain the VCF representation or the R representation of missing data.
Note that the conversion only takes place when the entire value can be determined to be missing.
For example, ".|.:48:8:51,51" would be retained because the missing genotype is accompanied by other delimited information.
In contrast, ".|." should be converted to NA when convertNA = TRUE
.
If file begins with http://, https://, ftp://, or ftps:// it is interpreted as a link. When this happens, file is split on the delimiter '/' and the last element is used as the filename. A check is performed to determine if this file exists in the working directory. If a local file is found it is used. If a local file is not found the remote file is downloaded to the working directory and read in.
The function write.vcf takes an object of either class vcfR or chromR and writes the vcf data to a vcf.gz file (gzipped text). If the parameter 'mask' is set to FALSE, the entire object is written to file. If the parameter 'mask' is set to TRUE and the object is of class chromR (which has a mask slot), this mask is used to subset the data. If an index is supplied as 'mask', then this index is used, and recycled as necessary, to subset the data.
Because vcfR provides the opportunity to manipulate VCF data, it also provides the opportunity for the user to create invalid VCF files. If there is a question regarding the validity of a file you have created one option is the VCF validator from VCF tools.
Value
read.vcfR returns an object of class vcfR-class
.
See the vignette: vignette('vcf_data')
.
The function write.vcf creates a gzipped VCF file.
See Also
CRAN: pegas::read.vcf, PopGenome::readVCF, data.table::fread
Bioconductor: VariantAnnotation::readVcf
Use: browseVignettes('vcfR') to find examples.
Examples
data(vcfR_test)
vcfR_test
head(vcfR_test)
# CRAN requires developers to us a tempdir when writing to the filesystem.
# You may want to implement this example elsewhere.
orig_dir <- getwd()
temp_dir <- tempdir()
setwd( temp_dir )
write.vcf( vcfR_test, file = "vcfR_test.vcf.gz" )
vcf <- read.vcfR( file = "vcfR_test.vcf.gz", verbose = FALSE )
vcf
setwd( orig_dir )