ghap.vcf2phase {GHap} | R Documentation |
Convert VCF data into GHap phase
Description
This function takes phased genotype data in the Variant Call Format (VCF) and converts them into the GHap phase format.
Usage
ghap.vcf2phase(input.files = NULL, vcf.files = NULL,
sample.files = NULL, out.file,
ncores = 1, verbose = TRUE)
Arguments
If all input files share the same prefix, the user can use the following shortcut options:
input.files |
Character vector with the list of prefixes for input files. |
out.file |
Character value for the output file name. |
The user can also opt to point to input files separately:
vcf.files |
Character vector containing the list of VCF files. |
sample.files |
Character vector containing the list of SAMPLE files. |
To turn conversion progress-tracking on or off or set the number of cores please use:
ncores |
A numeric value specfying the number of cores to use (default = 1). |
verbose |
A logical value specfying whether log messages should be printed (default = TRUE). |
Details
The Variant Call Format (VCF) - as described in https://github.com/samtools/hts-specs - is here manipulated to obtain the GHap phase format. Important: the function does not apply filters to the data, except for skipping multi-allelic variants. Should variants be filtered, the user is advised to pre-process the VCF files with third-party software (such as BCFTools). The FORMAT field should also follow the "GT:..." specification, with genotypes placed first in each sample column. Finally, all genotypes should be phased and take one of the following values: "0|0", "0|1", "1|0" or "1|1". Warning: this function is not optimized for very large datasets.
Author(s)
Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>
References
H. Li et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009. 25:2078-2079.
H. Li. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011. 27(21):2987-2993.
See Also
ghap.compress
, ghap.loadphase
, ghap.fast2phase
, ghap.oxford2phase
Examples
# #### DO NOT RUN IF NOT NECESSARY ###
#
# # Copy the example data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
# format = "vcf",
# verbose = TRUE)
# file.copy(from = exfiles, to = "./")
#
# ### RUN ###
#
# # Convert from a single genome-wide file
# ghap.vcf2phase(input.files = "example",
# out.file = "example")
#
# # Convert from a list of chromosome files
# ghap.vcf2phase(input.files = paste0("example_chr",1:10),
# out.file = "example")
#
# # Convert using separate lists for file extensions
# ghap.vcf2phase(vcf.files = paste0("example_chr",1:10,".vcf"),
# sample.files = paste0("example_chr",1:10,".sample"),
# out.file = "example")