SKAT.bootstrap {Ravages}R Documentation

Multi group SKAT test using bootstrap sampling

Description

Peforms SKAT on two or more groups of individuals using bootstrap sampling

Usage

  SKAT.bootstrap(x, NullObject, genomic.region = x@snps$genomic.region,
                 weights = (1-x@snps$maf)**24, maf.threshold = 0.5,
                 perm.target = 100, perm.max = 5e4, debug = FALSE,
                 estimation.pvalue = "kurtosis")

Arguments

x

A bed.matrix

NullObject

A list returned from NullObject.parameters

genomic.region

A factor defining the genomic region of each variant

weights

A vector with the weight of each variant. By default, the weight of each variant is inversely proportionnal to its MAF, as it was computed in the original SKAT method

maf.threshold

The MAF above which variants are removed (default is to keep all variants)

perm.target

The number of times to exceed the observed statistics. If not reached, perm.max permutations will be used

perm.max

The maximum number of permutations to perform to estimate the p-value, will be used if perm.target is not reached

debug

Whether to print details about the permutations (mean, standard deviation, skewness, kurtosis), FALSE by default

estimation.pvalue

Whether to use the skewness ("skewness") or the kurtosis ("kurtosis") for the chi-square approximation

Details

P-values estimation is based on bootstrap sampling and a sequential procedure: permutated statistics are computed and each one is compared to the observed statistics. The boostrap progam stops when either perm.target or perm.max is reached. P-values are then computed using a mixed procedure:

if perm.target is reached, the p-value is computed as : perm.target divided by the number of permutations used to reach perm.target;

if perm.max is reached, p-values are approximated using a chi-square distributions based on the first three moments if estimation.pvalue = "skewness", or on statistics' moments 1, 2 and 4 if estimation.pvalue = "kurtosis".

If debug=TRUE, more informations about the estimated statistics moments are given.

This function is used by SKAT when the sample size is smaller than 2000 and covariates are present.

All missing genotypes are imputed by the mean genotype.

Value

A data frame containing for each genomic:

stat

The observed statistics

p.value

p.perm if perm.target is reached, p.chi2 if perm.max is reached.

p.perm

The p-value computed by permutations: number of times permutated is greater than observed statistics divided by the total number of permutations performed

p.chi2

The p-value computed by the chi-square approximation using the SKAT small sample procedure

If debug=TRUE, other informations are given about the moments estimation:

nb.gep

The number of times a permutated statistics is equal or greater than the observed statistics stat

nb.eq

The number of times a permutated statistics is equal to the observed statistics stat

nb.perms

The total number of simulations performed

mean

The mean of the permutated statistics

sigma

The standard deviation of the permutated statistics

skewness

The skweness of the permutated statistics

kurtosis

The kurtosis of the permutated statistics

References

Wu et al. 2011, Rare-variant association testing for sequencing data with the sequence kernel association test, American Journal of Human Genetics 82-93 doi:10.1016/j.ajhg.2011.05.029;

Lee et al. 2012, Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies, American Journal of Human Genetics, doi:10.1016/j.ajhg.2012.06.007;

See Also

NullObject.parameters, SKAT

Examples


#Import data in a bed matrix
x <- as.bed.matrix(x=LCT.matrix.bed, fam=LCT.matrix.fam, bim=LCT.snps)
#Add population
x@ped[,c("pop", "superpop")] <- LCT.matrix.pop1000G[,c("population", "super.population")]

#Select EUR superpopulation
x <- select.inds(x, superpop=="EUR")
x@ped$pop <- droplevels(x@ped$pop)

#Group variants within known genes
x <- set.genomic.region(x)

#Filter of rare variants: only non-monomorphic variants with
#a MAF lower than 1%
#keeping only genomic regions with at least 10 SNPs
x1 <- filter.rare.variants(x, filter = "whole", maf.threshold = 0.01, min.nb.snps = 10)

#Simulation of a covariate + Sex as a covariate
sex <- x1@ped$sex
set.seed(1) ; u <- runif(nrow(x1))
covar <- cbind(sex, u)

#run SKAT using the 1000 genome EUR populations as "outcome"
#The maximum number of permutations used is 100,
#and the target number is 10, please increase
#both values for a more accurate estimation of the p-values
#Fit Null model with covariates
x1.H0 <- NullObject.parameters(x1@ped$pop, data = covar, RVAT = "SKAT", pheno.type = "categorical")

SKAT.bootstrap(x1, x1.H0, perm.target = 10, perm.max = 100)


[Package Ravages version 1.1.3 Index]