R: SKAT test

SKAT {Ravages}

R Documentation

SKAT test

Description

Peforms SKAT on categorical or binary phenotypes

Usage

SKAT(x, NullObject, genomic.region = x@snps$genomic.region, 
     weights = (1 - x@snps$maf)**24, maf.threshold = 0.5, 
     get.moments = "size.based", estimation.pvalue = "kurtosis",
     params.sampling, cores = 10, debug = FALSE, verbose = TRUE)

Arguments

`x`	A bed.matrix
`NullObject`	A list returned from `NullObject.parameters`
`genomic.region`	A factor defining the genomic region of each variant
`weights`	A vector with the weight of each variant. By default, the weight of each variant is inversely proportionnal to its MAF, as it was computed in the original SKAT method
`maf.threshold`	The MAF above which variants are removed (default is to keep all variants)
`get.moments`	How to estimate the moments to compute the p-values among "size.based", "bootstrap", "permutations", or "theoretical" for categorical phenotypes (2 or more groups of individuals). By default "size.based" that will choose the method depending on sample size (see `details`)
`estimation.pvalue`	Whether to use the skewness ("skewness") or the kurtosis ("kurtosis") for the chi-square approximation
`params.sampling`	A list containing the elements "perm.target", "perm.max", "debug". Only needed if `get.moments = "boostrap"` or `get.moments = "permutations"`
`cores`	How many cores to use for moments computation, set at 10 by default. Only needed if `get.moments = "theoretical"`
`debug`	Whether to return the mean, standard deviation, skewness and kurtosis of the statistics
`verbose`	Whether to display information about the function actions

Details

For categorical phenotypes, the p-value is calculated using a chi-square approximation based on the statistics' moments. The user has to choose how to compute these moments (argument get.moments), and which moments to use for the chi-square approximation (argument estimation.pvalue).

The moments can be computed either using a sampling procedure ("permutations" if there are no covariates, or "bootstrap" otherwise), or using theoretical moments computed as in Liu et al. 2008 ("theoretical").

If get.moments = "size.based", the sampling procedure will be used for sample sizes lower than 2000, and the theoretical calculations otherwise.

To estimate the p-values, etiher the first three moments are used (estimation.pvalue = "skewness"), or the moments 1, 2 and 4 are used (estimation.pvalue = "kurtosis").

If get.moments = "theoretical" and estimation.pvalue = "skewness", it corresponds to method = "liu" in the SKAT package. If get.moments = "theoretical" and estimation.pvalue = "kurtosis", it corresponds to method = "liu.mod" in the SKAT package.

For small samples, p-values estimation is based on sampling and a sequential procedure: permutated statistics are computed and each one is compared to the observed statistics. This method requires perm.target and perm.max that should be given as a list to params.bootstrap. If params.bootstrap is not specified, perm.target will be set at 100, perm.max at 5e4. The boostrap progam stops when either perm.target or perm.max is reached. P-values are then computed using a mixed procedure:

if perm.target is reached, the p-value is computed as : perm.target divided by the number of permutations used to reach perm.target;

if perm.max is reached, the SKAT small sample procedure is used, and p-values are approximated using a chi-square distributions based on statistics' moments 1, 2 and 4 computed from the permutated values.

If NullObject$pheno.type = "continuous", the method from Liu et al. will be used to compute the p-value for the continuous phenotype, but estimation.pvalue can be set at "skewness" or "kurtosis".

If debug=TRUE, more informations about the estimated statistics moments are given.

All missing genotypes are imputed by the mean genotype.

Value

A data frame containing for each genomic region:

`stat`	The observed statistics
`p.value`	The p-value of the test

If get.moments = "bootstrap" or get.moments = "permutations", additional fields are present:

`p.perm`	The p-value computed by permutations: number of times permutated is greater than observed statistics divided by the total number of permutations performed
`p.chi2`	The p-value computed by the chi-square approximation using the SKAT small sample procedure

If debug = TRUE, the mean, standard deviation, skewness and kurtosis are also returned, as well as for the sampling procedure:

`nb.gep`	The number of times a permutated statistics is equal or greater than the observed statistics `stat`
`nb.eq`	The number of times a permutated statistics is equal to the observed statistics `stat`
`nb.perms`	The total number of simulations performed

References

Wu et al. 2011, Rare-variant association testing for sequencing data with the sequence kernel association test, American Journal of Human Genetics 82-93 doi:10.1016/j.ajhg.2011.05.029;

Lee et al. 2012, Optimal Unified Approach for Rare-Variant Association Testing with Application to Small-Sample Case-Control Whole-Exome Sequencing Studies, American Journal of Human Genetics, doi:10.1016/j.ajhg.2012.06.007;

Liu et al. 2008, A new chi-square approximation to the distribution of non-negative definite quadratic forms in non-central normal variables, Computational Statistics & Data Analysis, doi:10.1016/j.csda.2008.11.025

Examples


#Example on simulated data from Ravages with 
#One group of 50 controls and 
#two groups of 25 cases, each one with a prevalence of 0.01
#with 50% of causal variants, 5 genomic regions are simulated
GRR.del <- GRR.matrix(GRR = "SKAT", genes.maf = Kryukov,
                      n.case.groups = 2, select.gene = "R1",
                      GRR.multiplicative.factor=2)

x.sim <- rbm.GRR(genes.maf = Kryukov, size = c(50, 25, 25),
                 prev = c(0.001, 0.001), GRR.matrix.del = GRR.del,
                 p.causal = 0.5, p.protect = 0, select.gene="R1",
                 same.variant = FALSE, genetic.model = "multiplicative", replicates = 5)
#Null Model
x.sim.H0 <- NullObject.parameters(x.sim@ped$pheno, RVAT = "SKAT", pheno.type = "categorical")

#Run SKAT (here permutations as n<2000 and no covariates)
#Parameters for the sampling procedure: target = 5, max = 100
#Please increase the number of permutations for a more accurate estimation of the p-values
params.sampling = list(perm.target = 5, perm.max = 100)
SKAT(x.sim, x.sim.H0, params.sampling = params.sampling)

#Run SKAT with a random continuous phenotype
#Null Model
x.sim.H0.c <- NullObject.parameters(rnorm(100), RVAT = "SKAT", pheno.type = "continuous")
SKAT(x.sim, x.sim.H0.c, cores = 1)



#Example on 1000Genome data
#Import data in a bed matrix
x <- as.bed.matrix(x=LCT.matrix.bed, fam=LCT.matrix.fam, bim=LCT.snps)
#Add population
x@ped[,c("pop", "superpop")] <- LCT.matrix.pop1000G[,c("population", "super.population")]

#Select EUR superpopulation
x <- select.inds(x, superpop=="EUR")
x@ped$pop <- droplevels(x@ped$pop)

#Group variants within known genes
x <- set.genomic.region(x)

#Filter of rare variants: only non-monomorphic variants with
#a MAF lower than 2.5%
#keeping only genomic regions with at least 10 SNPs
x1 <- filter.rare.variants(x, filter = "whole", maf.threshold = 0.025, min.nb.snps = 10)

#Simulation of a covariate + Sex as a covariate
sex <- x1@ped$sex
set.seed(1) ; u <- runif(nrow(x1))
covar <- cbind(sex, u)

#run SKAT using the 1000 genome EUR populations as "outcome"
#with very few permutations
#Please increase the permutations for a more accurate estimation of the p-values
#Fit Null model with covariate sex
x1.H0.covar <- NullObject.parameters(x1@ped$pop, RVAT = "SKAT", pheno.type = "categorical",
                                     data = covar, formula = ~ sex)

#Run SKAT with the covariates: use boostrap as n<2000
SKAT(x1, x1.H0.covar, params.sampling = params.sampling, get.moments = "bootstrap")

#Run SKAT using theoretical moments (discourage here as n<2000) and 1 core
#SKAT(x1, x1.H0.covar, get.moments = "theoretical", cores = 1)

[Package Ravages version 1.1.3 Index]