vcfcomp {vcfppR} | R Documentation |
Compare two VCF/BCF files reporting various statistics
Description
Compare two VCF/BCF files reporting various statistics
Usage
vcfcomp(
test,
truth,
formats = c("DS", "GT"),
stats = "r2",
by.sample = FALSE,
by.variant = FALSE,
flip = FALSE,
names = NULL,
bins = NULL,
af = NULL,
out = NULL,
choose_random_start = FALSE,
return_pse_sites = FALSE,
...
)
Arguments
test |
path to the first VCF/BCF file referred as test, or saved RDS file. |
truth |
path to the second VCF/BCF file referred as truth, or saved RDS file. |
formats |
character vector. the FORMAT tags to extract for the test and truth respectively. default c("DS", "GT") extracts 'DS' of the target and 'GT' of the truth. |
stats |
the statistics to be calculated. supports the following. "r2": pearson correlation coefficient ** 2. "f1": F1-score, good balance between sensitivity and precision. "nrc": Non-Reference Concordance rate "pse": Phasing Switch Error rate |
by.sample |
logical. calculate concordance for each samples, then average by bins. |
by.variant |
logical. calculate concordance for each variant, then average by bins. if both bysample and by variant are TRUE, then do average on all samples first. if both bysample and by variant are FALSE, then do average on all samples and variants. |
flip |
logical. flip the ref and alt variants |
names |
character vector. reset samples' names in the test VCF. |
bins |
numeric vector. break statistics into allele frequency bins. |
af |
file path to allele frequency text file or saved RDS file. |
out |
output prefix for saving objects into RDS file |
choose_random_start |
choose random start for stats="pse" |
return_pse_sites |
boolean. return phasing switch error sites |
... |
options passed to |
Details
vcfcomp
implements various statisitcs to compare two VCF/BCF files,
e.g. report genotype concocrdance, correlation stratified by allele frequency.
Value
a list of various statistics
Author(s)
Zilong Li zilong.dk@gmail.com
Examples
library('vcfppR')
test <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR")
truth <- system.file("extdata", "imputed.gt.vcf.gz", package="vcfppR")
samples <- "HG00133,HG00143,HG00262"
res <- vcfcomp(test, truth, stats="f1", format=c('GT','GT'), samples=samples)
str(res)