SKAT.bootstrap {Ravages} | R Documentation |
Multi group SKAT test using bootstrap sampling
Description
Peforms SKAT on two or more groups of individuals using bootstrap sampling
Usage
SKAT.bootstrap(x, NullObject, genomic.region = x@snps$genomic.region,
weights = (1-x@snps$maf)**24, maf.threshold = 0.5,
perm.target = 100, perm.max = 5e4, debug = FALSE,
estimation.pvalue = "kurtosis")
Arguments
x |
A bed.matrix |
NullObject |
A list returned from |
genomic.region |
A factor defining the genomic region of each variant |
weights |
A vector with the weight of each variant. By default, the weight of each variant is inversely proportionnal to its MAF, as it was computed in the original SKAT method |
maf.threshold |
The MAF above which variants are removed (default is to keep all variants) |
perm.target |
The number of times to exceed the observed statistics. If not reached, |
perm.max |
The maximum number of permutations to perform to estimate the p-value, will be used if |
debug |
Whether to print details about the permutations (mean, standard deviation, skewness, kurtosis), FALSE by default |
estimation.pvalue |
Whether to use the skewness ("skewness") or the kurtosis ("kurtosis") for the chi-square approximation |
Details
P-values estimation is based on bootstrap sampling and a sequential procedure: permutated statistics are computed and each one is compared to the observed statistics.
The boostrap progam stops when either perm.target
or perm.max
is reached.
P-values are then computed using a mixed procedure:
if perm.target
is reached, the p-value is computed as : perm.target
divided by the number of permutations used to reach perm.target
;
if perm.max
is reached, p-values are approximated using a chi-square distributions based on the first three moments if estimation.pvalue = "skewness"
, or on statistics' moments 1, 2 and 4 if estimation.pvalue = "kurtosis"
.
If debug=TRUE
, more informations about the estimated statistics moments are given.
This function is used by SKAT
when the sample size is smaller than 2000 and covariates are present.
All missing genotypes are imputed by the mean genotype.
Value
A data frame containing for each genomic:
stat |
The observed statistics |
p.value |
|
p.perm |
The p-value computed by permutations: number of times permutated is greater than observed statistics divided by the total number of permutations performed |
p.chi2 |
The p-value computed by the chi-square approximation using the SKAT small sample procedure |
If debug=TRUE
, other informations are given about the moments estimation:
nb.gep |
The number of times a permutated statistics is equal or greater than the observed statistics |
nb.eq |
The number of times a permutated statistics is equal to the observed statistics |
nb.perms |
The total number of simulations performed |
mean |
The mean of the permutated statistics |
sigma |
The standard deviation of the permutated statistics |
skewness |
The skweness of the permutated statistics |
kurtosis |
The kurtosis of the permutated statistics |
References
Wu et al. 2011, Rare-variant association testing for sequencing data with the sequence kernel association test, American Journal of Human Genetics 82-93 doi:10.1016/j.ajhg.2011.05.029;
Lee et al. 2012, Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies, American Journal of Human Genetics, doi:10.1016/j.ajhg.2012.06.007;
See Also
Examples
#Import data in a bed matrix
x <- as.bed.matrix(x=LCT.matrix.bed, fam=LCT.matrix.fam, bim=LCT.snps)
#Add population
x@ped[,c("pop", "superpop")] <- LCT.matrix.pop1000G[,c("population", "super.population")]
#Select EUR superpopulation
x <- select.inds(x, superpop=="EUR")
x@ped$pop <- droplevels(x@ped$pop)
#Group variants within known genes
x <- set.genomic.region(x)
#Filter of rare variants: only non-monomorphic variants with
#a MAF lower than 1%
#keeping only genomic regions with at least 10 SNPs
x1 <- filter.rare.variants(x, filter = "whole", maf.threshold = 0.01, min.nb.snps = 10)
#Simulation of a covariate + Sex as a covariate
sex <- x1@ped$sex
set.seed(1) ; u <- runif(nrow(x1))
covar <- cbind(sex, u)
#run SKAT using the 1000 genome EUR populations as "outcome"
#The maximum number of permutations used is 100,
#and the target number is 10, please increase
#both values for a more accurate estimation of the p-values
#Fit Null model with covariates
x1.H0 <- NullObject.parameters(x1@ped$pop, data = covar, RVAT = "SKAT", pheno.type = "categorical")
SKAT.bootstrap(x1, x1.H0, perm.target = 10, perm.max = 100)