R: Power Calculation for EQTL Analysis Based on Un-Balanced...

powerEQTL.ANOVA {powerEQTL}

R Documentation

Power Calculation for EQTL Analysis Based on Un-Balanced One-Way ANOVA

Description

Power calculation for eQTL analysis that tests if a SNP is associated to a gene probe by using un-balanced one-way ANOVA. This function can be used to calculate one of the 3 parameters (power, sample size, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 2 parameters.

Usage

powerEQTL.ANOVA(MAF,
                deltaVec=c(-0.13, 0.13),
                n=200,
                power = NULL,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000,
                n.lower = 4,
                n.upper = 1e+30)

Arguments

`MAF`	numeric. Minor allele frequency.
`deltaVec`	numeric. A vector having 2 elements. The first element is equal to `\mu_2-\mu_1` and the second element is equal to `\mu_3-\mu_2`, where `\mu_1` is the mean gene expression level for the mutation homozygotes, `\mu_2` is the mean gene expression level for the heterozygotes, and `\mu_3` is the mean gene expression level for the wild-type gene expression level.
`n`	integer. Total number of subjects.
`power`	numeric. Power for testing if 3 genotypes have the same mean gene expression levels.
`sigma`	numeric. Standard deviation of the random error.
`FWER`	numeric. Family-wise Type I error rate.
`nTests`	integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis.
`n.lower`	numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects.
`n.upper`	numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects.

Details

If we would like to test potential non-linear relationship between genotype of a SNP and expression of a gene, we can use un-balanced one-way ANOVA. Actually, an article published by the GTEx Consortium in 2013 used this approach.

Suppose there are k=3 groups of subjects: (1) mutation homozygotes; (2) heterozygotes; and (3) wildtype homozygotes. We would like to test if the mean expression \mu_i, i=1, \ldots, k, of the gene is the same among the k groups of subjects. We can use the following one-way ANOVA model to characterize the relationship between observed gene expression level y_{ij} and the population mean expression level \mu_i:

y_{ij} = \mu_i + \epsilon_{ij}, \quad \epsilon_{ij} \sim N\left(0, \sigma^2\right),

where i=1,\ldots, k, j= 1, \ldots, n_i, y_{ij} is the observed gene expression level for the j-th subject in the i-th group, \mu_i is the mean gene expression level of the i-th group, \epsilon_{ij} is the random error, k is the number of groups, n_i is the number of subjects in the i-th group. Denote the total number of subjects as N = \sum_{i=1}^{k} n_i. That is, we have n_1 mutation homozygotes, n_2 heterozygotes, and n_3 wildtype homozygotes.

We would like to test the null hypothesis H_0 and alternative hypothesis H_1:

H_0: \mu_1 = \mu_2 = \mu_3,

H_1: \mbox{not all means are the same}.

According to O'Brien and Muller (1993), the power calculation formula for unbalanced one-way ANOVA is

power=Pr\left(\left.F\geq F_{1-\alpha}\left(k-1, N-k\right)\right| F\sim F_{k-1, N-k, \lambda}\right),

where k=3 is the number of groups of subjects, N is the total number of subjects, F_{1-\alpha}\left(k-1, N-k\right) is the 100(1-\alpha)-th percentile of central F distribution with degrees of freedoms k-1 and N-k, and F_{k-1, N-k, \lambda} is the non-central F distribution with degrees of freedoms k-1 and N-k and non-central parameter (ncp) \lambda. The ncp \lambda is equal to

\lambda=\frac{N}{\sigma^2}\sum_{i=1}^{k} w_i \left(\mu_i-\mu\right)^2,

where \mu_i is the mean gene expression level for the i-th group of subjects, w_i is the weight for the i-th group of subjects, \sigma^2 is the variance of the random errors in ANOVA (assuming each group has equal variance), and \mu is the weighted mean gene expression level

\mu=\sum_{i=1}^{k}w_i \mu_i.

The weights w_i=n_i/N are the sample proportions for the 3 groups of subjects, where N=n_1+n_2+n_3 is the total number of subjects. Hence, \sum_{i=1}^{3}w_i = 1. Based on Hardy-Weinberg Equilibrium, we have w_1 = \theta^2, w_2 = 2\theta(1-\theta), and w_3 = (1-\theta)^2, where \theta is MAF.

Without loss of generality, we set \mu_1 = -\delta_1, \mu_2=0, and \mu_3 = \delta_2.

We adopted the parameters from the GTEx cohort (see the Power analysis" section of Nature Genetics, 2013; https://www.nature.com/articles/ng.2653), where they modeled the expression data as having a log-normal distribution with a log standard deviation of 0.13 within each genotype class (AA, AB, BB). This level of noise is based on estimates from initial GTEx data. In their power analysis, they assumed the across-genotype difference delta = 0.13 (i.e., equivalent to detecting a log expression change similar to the standard deviation within a single genotype class).

Value

power if the input parameter power = NULL.

sample size (total number of subjects) if the input parameter n = NULL;

minimum detectable slope if the input parameter slope = NULL.

Author(s)

Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>

References

The GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45:580-585, 2013.

O'Brien, RG and Muller, KE. Unified power analysis for t-tests through multivariate hypotheses. In LK Edwards, editor, Applied Analysis of Variance in Behavioral Science, pages 297-344. New York: Dekker, 1993.

Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385

Examples


# calculate power
powerEQTL.ANOVA(MAF = 0.1,
                deltaVec = c(-0.13, 0.13),
                n = 282,
                power = NULL,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000)
                
# calculate sample size (total number of subjects)
powerEQTL.ANOVA(MAF = 0.1,
                deltaVec = c(-0.13, 0.13),
                n = NULL,
                power = 0.8,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000) 
                
# calculate minimum allowable MAF
powerEQTL.ANOVA(MAF = NULL,
                deltaVec = c(-0.13, 0.13),
                n = 282,
                power = 0.8,
                sigma = 0.13,
                FWER = 0.05,
                nTests = 200000)

[Package powerEQTL version 0.3.4 Index]