filterLargeVCF {geneHapR} | R Documentation |
Pre-process of Large VCF File(s)
Description
Filter/extract one or multiple gene(s)/range(s) from a large
*.vcf/*.vcf.gz
file.
Usage
filterLargeVCF(VCFin = VCFin, VCFout = VCFout,
Chr = Chr,
POS = NULL,
start = start,
end = end,
override = TRUE)
Arguments
VCFin |
Path of input |
VCFout |
Path(s) of output |
Chr |
a single CHROM name or CHROM names vector. |
POS , start , end |
provide the range should be extract from orignal vcf.
|
override |
whether override existed file or not, default as |
Details
This package import VCF files with 'vcfR' which is more efficient to
import/manipulate VCF files in 'R'. However, import a large VCF file is time and
memory consuming. It's suggested that filter/extract variants in target
range with filterLargeVCF()
.
When filter/extract multi genes/ranges, the parameter of Chr
and POS
must have equal length. Results will save to a single file if the user
provide a single file path or save to multiple VCF file(s) when a equal length
vector consist with file paths is provided.
However, if you have hundreds gene/ranges need to extract from very large VCF file(s), it's prefer to process with other linux tools in a script on server, such as: 'vcftools' and 'bcftools'.
Value
No return value
Examples
# The filteration of small vcf should be done with `filter_vcf()`.
# however, here, we use a mini vcf instead just for example and test.
vcfPath <- system.file("extdata", "var.vcf.gz", package = "geneHapR")
oldDir <- getwd()
temp_dir <- tempdir()
if(! dir.exists(temp_dir))
dir.create(temp_dir)
setwd(temp_dir)
# extract a single gene/range from large vcf
filterLargeVCF(VCFin = vcfPath, VCFout = "filtered.vcf.gz",
Chr = "scaffold_1", POS = c(4300,5000), override = TRUE)
# extract multi genes/ranges from large vcf
filterLargeVCF(VCFin = vcfPath,
VCFout = c("filtered1.vcf.gz",
"filtered2.vcf.gz",
"filtered3.vcf.gz"),
Chr = rep("scaffold_1", 3),
POS = list(c(4300, 5000),
c(5000, 6000),
c(5000, 7000)),
override = TRUE)
setwd(oldDir)