GenoScan.VCF.chr {GenoScan} | R Documentation |
Scan a VCF file to study the association between an quantitative/dichotomous outcome variable and a region or whole chromosome by score type statistics allowing for multiple functional annotation scores.
Description
Once the preliminary work is done by "GenoScan.prelim()", this function scan a target region or chromosome, and output results for all windows as well as an estimated significance threshold. For genome-wide scan, users can scan each chromosome individually, then the genome-wide significance threshold can be obtained by combining chromosome-wise thresholds:
alpha=1/(1/alpha_1+1/alpha_2+...+1/alpha_22).
Usage
GenoScan.VCF.chr(result.prelim,vcf.filename,chr,pos.min=NULL,pos.max=NULL,
Gsub.id=NULL,annot.filename=NULL,cell.type=NULL,MAF.weights='beta',
test='combined',window.size=c(5000,10000,15000,20000,25000,50000),
MAF.threshold=1,impute.method='fixed')
Arguments
result.prelim |
The output of function "GenoScan.prelim()" |
vcf.filename |
A character specifying the directory (including the file name) of the vcf file. |
chr |
Chromosome number. |
pos.min |
Minimum position of the scan. The default is NULL, where the scan starts at the first base pair. |
pos.max |
Maximum position of the scan. The default is NULL, where the scan ends at the last base pair, according to the chromosome sizes at: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes. |
Gsub.id |
The subject id corresponding to the genotype matrix, an n dimensional vector. This is used to match phenotype with genotype. The default is NULL, where the subject id in the vcf file is used. |
annot.filename |
A character specifying the directory (including the file name) of functional annotations. Currently GenoScan supports GenoNet scores across 127 tissues/cell types, which can be downloaded at: http://www.openbioinformatics.org/annovar/download/GenoNetScores/ |
cell.type |
A character specifying the tissue/cell type integrated in the analysis, in addition to standard dispersion and/or burden tests. The default is NULL, where no functional annotation is included. If cell.type='all', GenoNet scores across all 127 tissues/cell types are incorperated. |
MAF.weights |
Minor allele frequency based weight. Can be 'beta' to up-weight rare variants or 'equal' for a flat weight. The default is 'beta'. |
test |
Can be 'dispersion', 'burden' or 'combined'. The test is 'combined', both dispersion and burden tests are applied. The default is 'combined'. |
window.size |
Candidate window sizes in base pairs. The default is c(5000,10000,15000,20000,25000,50000). Note that extemely small window size (e.g. 1) requires large sample size. |
MAF.threshold |
Threshold for minor allele frequency. Variants above MAF.threshold are ignored. The default is 1. |
impute.method |
Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "bestguess" uses the genotype with highest probability. |
Value
window.summary |
Results for all windows. Each row presents a window. |
M |
Estimated number of effective tests. |
threshold |
Estimated threshold, 0.05/M. |
Examples
# load example vcf file from package "seqminer"
vcf.filename = system.file("vcf/all.anno.filtered.extract.vcf.gz", package = "seqminer")
# simulated outcomes, covariates and inidividual id.
Y<-as.matrix(rnorm(3,0,1))
X<-as.matrix(rnorm(3,0,1))
id<-c("NA12286", "NA12341", "NA12342")
# fit null model
result.prelim<-GenoScan.prelim(Y,X=X,id=id,out_type="C",B=5000)
# scan the vcf file
result<-GenoScan.VCF.chr(result.prelim,vcf.filename,chr=1,pos.min=196621007,pos.max=196716634)
## this is how the actual genotype matrix from package "seqminer" looks like
example.G <- t(readVCFToMatrixByRange(vcf.filename, "1:196621007-196716634",annoType='')[[1]])