BootstrapQTL {BootstrapQTL}  R Documentation 
Performs cisQTL mapping using MatrixEQTL then performs a bootstrap analysis to obtain unbiased effect size estimates for traits with significant evidence of genetic regulation correcting for the "Winner's Curse" arising from leadSNP selection.
BootstrapQTL(snps, gene, snpspos, genepos, cvrt = SlicedData$new(), n_bootstraps = 200, n_cores = 1, eGene_detection_file_name = NULL, bootstrap_file_directory = NULL, cisDist = 1e+06, local_correction = "bonferroni", global_correction = "fdr", correction_type = "shrinkage", errorCovariance = numeric(), useModel = modelLINEAR, eigenMT_tests_per_gene = NULL)
snps 

gene 

snpspos 

genepos 
Note that 
cvrt 

n_bootstraps 
number of bootstraps to run. 
n_cores 
number of cores to parallise the bootstrap procedure over. 
eGene_detection_file_name 

bootstrap_file_directory 

cisDist 

local_correction 
multiple testing correction method to use when
correcting pvalues across all SNPs at each gene (see EQTL mapping
section in Details). Can be a method specified in 
global_correction 
multiple testing correction method to use when
correcting pvalues across all genes after performing local correction
(see EQTL mapping section in Details). Must be a method specified in

correction_type 

errorCovariance 

useModel 

eigenMT_tests_per_gene 

Although the package interface and documentation describe the use of
BootstrapQTL
for ciseQTL mapping, the package can be
applied to any QTL study of quantitative traits with chromosomal
positions, for example cisQTL mapping of epigenetic
modifications. Any matrix of molecular trait data can be provided
to the 'gene'
argument provided a corresponding 'genepos'
'data.frame' detailing the chromosomal positions of each trait is
provided.
EQTL mapping is performed using the
MatrixEQTL
package. A three step
hieararchical multiple testing correction procedure is used to
determine significant eGenes and eSNPs. At the first step, nominal
pvalues from MatrixEQTL
for all
cisSNPs are adjusted for each gene separately using the
method specified in the 'local_correction'
argument
(Bonferroni correction by default). In the second step, the best
adjusted pvalue is taken for each gene, and this set of locally
adjusted pvalues is corrected for multiple testing across all genes
using the methods pecified in the 'global_correction'
argument
(FDR correction by default). In the third step, an eSNP significance
threshold on the locally corrected pvalues is determined as the
locally corrected pvalue corresponding to the globally corrected
pvalue threshold of 0.05.
A gene is considered a significant eGene if its globally corrected pvalue is < 0.05, and a SNP is considered a significant eSNP for that eGene if its locally corrected pvalue < the eSNP significance threshold.
The default settings for 'local_correction'
and
'global_correction'
were found to best control eGene false
discovery rate without sacrificing sensitivity (see citation).
EQTL effect sizes of significant eSNPs on significant eGenes are
typically overestimated when compared to replication datasets
(see citation). BootstrapEQTL
removes this overestimation by
performing a bootstrap procedure after eQTL mapping.
Three Winner's Curse correction methods are available: the Shrinkage
method, the Out of Sample method, and the Weighted Estimator method.
All three methods work on the same basic principle of performing
repeated sample bootstrapping to partition the dataset into two
groups: an eQTL detection group comprising study samples select via
random sampling with replacement, and an eQTL effect size estimation
group comprising the remaining samples not selected via the random
sampling. The default estimator, 'correction_type = "shrinkage"'
,
provided the most accurate corrected effect sizes in our simulation
study (see citation).
The shrinkage method ("shrinkage" in
'correction_type'
) corrects for the Winner's Curse by
measuring the average difference between the eQTL effect size
in the bootstrap detection group and the bootstrap estimation group,
then subtracting this difference from the naive eQTL effect size
estimate obtained from the eGene detection analysis prior to the
bootstrap procedure.
The out of sample method ("out_of_sample" in
'correction_type'
) corrects for the Winner's Curse by taking
the average eQTL effect size across bootstrap estimation groups as
an unbiased effect size estimate.
The weighted estimator method ("weighted" in
'correction_type'
) corrects for the Winner's Curse by taking
a weighted average of the nominal estimate of the eQTL effect size
and the average of eQTL effect sizes across the bootstrap estimation
groups: 0.368 * naive_estimate + 0.632 *
mean(bootstrap estimation group effect sizes).
In all three methods bootstrap effect sizes only contribute to the Winner's Curse correction if the corresponding eSNP is significantly associated with the eGene in the bootstrap detection group (locally corrected bootstrap Pvalue < eSNP significance threshold determing in the eQTL mapping step).
Note that eQTLs may not remain significant in all bootstraps, so the
effective number of bootstraps used to obtain the Winner's Curse
estimate will typically be lower than the number of bootstraps
specified in 'n_bootstraps'
. The number of bootstraps that
were significant for each eQTL are reported in the
'correction_boots'
column of the returned table.
The user should be aware that ability to correct for Winner's Curse
can vary across significant eQTLs depending on their statistical
power (i.e. minor allele frequency, true effect size, and
study sample size). Users should be skeptical of corrected effect
sizes that are larger than the nominal effect sizes estimated by
MatrixEQTL
, which likely reflects low
power for eQTL detection rather than an underestimated effect size.
It is possible for bootstrap analyses to fail due to the reduced
sample sizes of the bootstrap detection and bootstrap estimation
groups. For example, the bootstrap resampling may lead to an
detection or estimation groups in which all individuals are
homozygous for an eSNP or have no variance in their supplied
covariates (e.g. the estimation group may comprise
individuals all of the same sex). In this case the bootstrap will
fail for all eQTLs since MatrixEQTL
will
be unable to perform the model fitting.
Failed bootstraps are reported after the bootstrap procedure in a series of warning messages indicating the number of bootstrap failures grouped by the reason for the bootstrap failure.
A data.frame
(or data.table
if the
user has the library loaded) containing the results for each significant eQTL:
'eGene':
The eQTL eGene.
'eSNP':
The eQTL eSNP.
'statistic':
The test statistic for the association between the eGene and eSNP.
'nominal_beta':
The eQTL effect size for the eGene
eSNP
pair estimated by MatrixEQTL
.
'corrected_beta':
The eQTL effect size after adjustment for the winners_curse
.
'winners_curse':
The amount of effect size overestimation determined by the bootstrap analysis (See Details).
'correction_boots':
The number of bootstraps that contributed to the estimation of
the winners_curse
, i.e. the number of bootstraps in which the eSNP
remained significantly associated with the eGene
(see Details).
'nominal_pval':
The pvalue for the eGene
eSNP
pair
from the MatrixEQTL
analysis.
'eSNP_pval':
The locally corrected pvalue for the eGene
eSNP
pair (see Details).
'eGene_pval':
The globally corrected pvalue for the eGene
based on its top eSNP (see Details).
# Locations for example data from the MatrixEQTL package base.dir = find.package('MatrixEQTL'); SNP_file_name = paste(base.dir, "/data/SNP.txt", sep=""); snps_location_file_name = paste(base.dir, "/data/snpsloc.txt", sep=""); expression_file_name = paste(base.dir, "/data/GE.txt", sep=""); gene_location_file_name = paste(base.dir, "/data/geneloc.txt", sep=""); covariates_file_name = paste(base.dir, "/data/Covariates.txt", sep=""); # Load the SNP data snps = SlicedData$new(); snps$fileDelimiter = "\t"; # the TAB character snps$fileOmitCharacters = "NA"; # denote missing values; snps$fileSkipRows = 1; # one row of column labels snps$fileSkipColumns = 1; # one column of row labels snps$fileSliceSize = 2000; # read file in slices of 2,000 rows snps$LoadFile(SNP_file_name); # Load the data detailing the position of each SNP snpspos = read.table(snps_location_file_name, header = TRUE, stringsAsFactors = FALSE); # Load the gene expression data gene = SlicedData$new(); gene$fileDelimiter = "\t"; # the TAB character gene$fileOmitCharacters = "NA"; # denote missing values; gene$fileSkipRows = 1; # one row of column labels gene$fileSkipColumns = 1; # one column of row labels gene$fileSliceSize = 2000; # read file in slices of 2,000 rows gene$LoadFile(expression_file_name); # Load the data detailing the position of each gene genepos = read.table(gene_location_file_name, header = TRUE, stringsAsFactors = FALSE); # Load the covariates data cvrt = SlicedData$new(); cvrt$fileDelimiter = "\t"; # the TAB character cvrt$fileOmitCharacters = "NA"; # denote missing values; cvrt$fileSkipRows = 1; # one row of column labels cvrt$fileSkipColumns = 1; # one column of row labels if(length(covariates_file_name)>0) { cvrt$LoadFile(covariates_file_name); } # Run the BootstrapQTL analysis eQTLs < BootstrapQTL(snps, gene, snpspos, genepos, cvrt, n_bootstraps=10, n_cores=2)