powerEQTL.scRNAseq {powerEQTL} | R Documentation |
Power Calculation for Association Between Genotype and Gene Expression Based on Single Cell RNAseq Data
Description
Power calculation for association between genotype and gene expression based on single cell RNAseq data. This function can be used to calculate one of the 4 parameters (power, sample size, minimum detectable slope, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 3 parameters.
Usage
powerEQTL.scRNAseq(
slope,
n,
m,
power = NULL,
sigma.y,
MAF = 0.2,
rho = 0.8,
FWER = 0.05,
nTests = 1,
n.lower = 2.01,
n.upper = 1e+30)
Arguments
slope |
numeric. Slope under alternative hypothesis. |
n |
integer. Total number of subjects. |
m |
integer. Number of cells per subject. |
power |
numeric. Power for testing if the slope is equal to zero. |
sigma.y |
numeric. Standard deviation of the gene expression. |
MAF |
numeric. Minor allele frequency (between 0 and 0.5). |
rho |
numeric. Intra-class correlation (i.e., correlation between
|
FWER |
numeric. Family-wise Type I error rate for one pair (SNP, gene). |
nTests |
integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis. |
n.lower |
numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects. |
n.upper |
numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects. |
Details
We assume the following simple linear mixed effects model for each (SNP, gene) pair to characterize the association between genotype and gene expression:
y_{ij} = \beta_{0i} + \beta_1 * x_i + \epsilon_{ij},
where
\beta_{0i} \sim N\left(\beta_0, \sigma^2_{\beta}\right),
and
\epsilon_{ij} \sim N\left(0, \sigma^2\right),
i=1,\ldots, n
, j=1,\ldots, m
, n
is the number of
subjects, m
is the number of cells per subject,
y_{ij}
is the gene expression level for the j
-th cell
of the i
-th subject, x_i
is the genotype for the
i
-th subject using additive coding. That is, x_i=0
indicates the i
-th subject is a wildtype homozygote,
x_i=1
indicates the i
-th subject is a heterozygote,
and x_i=2
indicates the i
-th subject is a mutation
homozygote.
We would like to test the following hypotheses:
H_0: \beta_1=0,
and
H_1: \beta_1 = \delta,
where \delta\neq 0
.
For a given SNP, we assume Hardy-Weinberg Equilibrium and denote the minor allele
frequency of the SNP as \theta
.
We can derive the power calculation formula is
power=1- \Phi\left(z_{\alpha^{*}/2}-a\times b\right)
+\Phi\left(-z_{\alpha^{*}/2} - a\times b\right),
where
a=
\frac{\sqrt{2\theta\left(1-\theta\right)}}{\sigma_y}
and
b=\frac{\delta\sqrt{m(n-1)}}{\sqrt{1+(m-1)\rho}}
and z_{\alpha^{*}/2}
is the upper 100\alpha^{*}/2
percentile of the standard normal distribution,
\alpha^{*}=FWER/nTests
, nTests is the number of
(SNP, gene) pairs,
\sigma_y=\sqrt{\sigma^2_{\beta}+\sigma^2}
,
and \rho=\sigma^2_{\beta}/\left(\sigma^2_{\beta}+\sigma^2\right)
is the intra-class correlation.
Value
power if the input parameter power = NULL
.
sample size (total number of subjects) if the input parameter n = NULL
;
minimum detectable slope if the input parameter slope = NULL
;
minimum allowable MAF if the input parameter MAF = NULL
.
Author(s)
Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>
References
Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385
Examples
n = 102
m = 227868
# calculate power
power = powerEQTL.scRNAseq(
slope = 0.6,
n = n,
m = m,
power = NULL,
sigma.y = 0.29,
MAF = 0.05,
rho = 0.8,
nTests = 1e+6)
print(power)
# calculate sample size (total number of subjects)
n = powerEQTL.scRNAseq(
slope = 0.6,
n = NULL,
m = m,
power = 0.9567288,
sigma.y = 0.29,
MAF = 0.05,
rho = 0.8,
nTests = 1e+6)
print(n)
# calculate slope
slope = powerEQTL.scRNAseq(
slope = NULL,
n = n,
m = m,
power = 0.9567288,
sigma.y = 0.29,
MAF = 0.05,
rho = 0.8,
nTests = 1e+6)
print(slope)
# calculate MAF
MAF = powerEQTL.scRNAseq(
slope = 0.6,
n = n,
m = m,
power = 0.9567288,
sigma.y = 0.29,
MAF = NULL,
rho = 0.8,
nTests = 1e+6)
print(MAF)