powerEQTL.ANOVA {powerEQTL} | R Documentation |
Power Calculation for EQTL Analysis Based on Un-Balanced One-Way ANOVA
Description
Power calculation for eQTL analysis that tests if a SNP is associated to a gene probe by using un-balanced one-way ANOVA. This function can be used to calculate one of the 3 parameters (power, sample size, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 2 parameters.
Usage
powerEQTL.ANOVA(MAF,
deltaVec=c(-0.13, 0.13),
n=200,
power = NULL,
sigma = 0.13,
FWER = 0.05,
nTests = 200000,
n.lower = 4,
n.upper = 1e+30)
Arguments
MAF |
numeric. Minor allele frequency. |
deltaVec |
numeric. A vector having 2 elements. The first element is equal to
|
n |
integer. Total number of subjects. |
power |
numeric. Power for testing if 3 genotypes have the same mean gene expression levels. |
sigma |
numeric. Standard deviation of the random error. |
FWER |
numeric. Family-wise Type I error rate. |
nTests |
integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis. |
n.lower |
numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects. |
n.upper |
numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects. |
Details
If we would like to test potential non-linear relationship between genotype of a SNP and expression of a gene, we can use un-balanced one-way ANOVA. Actually, an article published by the GTEx Consortium in 2013 used this approach.
Suppose there are k=3
groups of subjects: (1) mutation homozygotes; (2) heterozygotes; and (3) wildtype homozygotes. We would like to test if the mean expression \mu_i
, i=1, \ldots, k
, of the gene is the same among the k
groups of subjects. We can use the following one-way ANOVA model to characterize the relationship between observed gene expression level y_{ij}
and the population mean expression level \mu_i
:
y_{ij} = \mu_i + \epsilon_{ij}, \quad \epsilon_{ij} \sim N\left(0, \sigma^2\right),
where i=1,\ldots, k
, j= 1, \ldots, n_i
,
y_{ij}
is the observed gene expression level for the j
-th subject in the i
-th group, \mu_i
is the mean gene expression level of the i
-th group, \epsilon_{ij}
is the random error, k
is the number of groups, n_i
is the number of subjects in the i
-th group.
Denote the total number of subjects as N = \sum_{i=1}^{k} n_i
.
That is, we have n_1
mutation homozygotes, n_2
heterozygotes, and n_3
wildtype homozygotes.
We would like to test the null hypothesis H_0
and alternative hypothesis H_1
:
H_0: \mu_1 = \mu_2 = \mu_3,
H_1: \mbox{not all means are the same}.
According to O'Brien and Muller (1993), the power calculation formula for unbalanced one-way ANOVA is
power=Pr\left(\left.F\geq F_{1-\alpha}\left(k-1, N-k\right)\right| F\sim
F_{k-1, N-k, \lambda}\right),
where k=3
is the number of groups of subjects, N
is the total number
of subjects, F_{1-\alpha}\left(k-1, N-k\right)
is the
100(1-\alpha)
-th percentile of central F distribution with degrees of freedoms k-1
and N-k
, and F_{k-1, N-k, \lambda}
is the non-central F distribution
with degrees of freedoms k-1
and N-k
and non-central parameter (ncp)
\lambda
. The ncp \lambda
is equal to
\lambda=\frac{N}{\sigma^2}\sum_{i=1}^{k} w_i \left(\mu_i-\mu\right)^2,
where \mu_i
is the mean gene expression level
for the i
-th group of subjects, w_i
is the weight for the i
-th group of subjects, \sigma^2
is the variance of the random errors in ANOVA (assuming each group has equal variance), and
\mu
is the weighted mean gene expression level
\mu=\sum_{i=1}^{k}w_i \mu_i.
The weights w_i=n_i/N
are the sample proportions for the 3 groups of subjects, where N=n_1+n_2+n_3
is the total number of subjects. Hence,
\sum_{i=1}^{3}w_i = 1
.
Based on Hardy-Weinberg Equilibrium, we have
w_1 = \theta^2
, w_2 = 2\theta(1-\theta)
, and w_3 = (1-\theta)^2
,
where \theta
is MAF.
Without loss of generality, we set \mu_1 = -\delta_1
, \mu_2=0
,
and \mu_3 = \delta_2
.
We adopted the parameters from the GTEx cohort (see the Power analysis" section of Nature Genetics, 2013; https://www.nature.com/articles/ng.2653), where they modeled the expression data as having a log-normal distribution with a log standard deviation of 0.13 within each genotype class (AA, AB, BB). This level of noise is based on estimates from initial GTEx data. In their power analysis, they assumed the across-genotype difference delta = 0.13 (i.e., equivalent to detecting a log expression change similar to the standard deviation within a single genotype class).
Value
power if the input parameter power = NULL
.
sample size (total number of subjects) if the input parameter n = NULL
;
minimum detectable slope if the input parameter slope = NULL
.
Author(s)
Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>
References
The GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45:580-585, 2013.
O'Brien, RG and Muller, KE. Unified power analysis for t-tests through multivariate hypotheses. In LK Edwards, editor, Applied Analysis of Variance in Behavioral Science, pages 297-344. New York: Dekker, 1993.
Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385
Examples
# calculate power
powerEQTL.ANOVA(MAF = 0.1,
deltaVec = c(-0.13, 0.13),
n = 282,
power = NULL,
sigma = 0.13,
FWER = 0.05,
nTests = 200000)
# calculate sample size (total number of subjects)
powerEQTL.ANOVA(MAF = 0.1,
deltaVec = c(-0.13, 0.13),
n = NULL,
power = 0.8,
sigma = 0.13,
FWER = 0.05,
nTests = 200000)
# calculate minimum allowable MAF
powerEQTL.ANOVA(MAF = NULL,
deltaVec = c(-0.13, 0.13),
n = 282,
power = 0.8,
sigma = 0.13,
FWER = 0.05,
nTests = 200000)