powerEQTL.scRNAseq {powerEQTL}R Documentation

Power Calculation for Association Between Genotype and Gene Expression Based on Single Cell RNAseq Data

Description

Power calculation for association between genotype and gene expression based on single cell RNAseq data. This function can be used to calculate one of the 4 parameters (power, sample size, minimum detectable slope, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 3 parameters.

Usage

powerEQTL.scRNAseq(
  slope, 
  n, 
  m, 
  power = NULL,
  sigma.y, 
  MAF = 0.2, 
  rho = 0.8, 
  FWER = 0.05,
  nTests = 1,
  n.lower = 2.01,
  n.upper = 1e+30)

Arguments

slope

numeric. Slope under alternative hypothesis.

n

integer. Total number of subjects.

m

integer. Number of cells per subject.

power

numeric. Power for testing if the slope is equal to zero.

sigma.y

numeric. Standard deviation of the gene expression.

MAF

numeric. Minor allele frequency (between 0 and 0.5).

rho

numeric. Intra-class correlation (i.e., correlation between yijy_{ij} and yiky_{ik} for the jj-th and kk-th cells of the ii-th subject).

FWER

numeric. Family-wise Type I error rate for one pair (SNP, gene).

nTests

integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis.

n.lower

numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects.

n.upper

numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects.

Details

We assume the following simple linear mixed effects model for each (SNP, gene) pair to characterize the association between genotype and gene expression:

yij=β0i+β1xi+ϵij,y_{ij} = \beta_{0i} + \beta_1 * x_i + \epsilon_{ij},

where

β0iN(β0,σβ2),\beta_{0i} \sim N\left(\beta_0, \sigma^2_{\beta}\right),

and

ϵijN(0,σ2),\epsilon_{ij} \sim N\left(0, \sigma^2\right),

i=1,,ni=1,\ldots, n, j=1,,mj=1,\ldots, m, nn is the number of subjects, mm is the number of cells per subject, yijy_{ij} is the gene expression level for the jj-th cell of the ii-th subject, xix_i is the genotype for the ii-th subject using additive coding. That is, xi=0x_i=0 indicates the ii-th subject is a wildtype homozygote, xi=1x_i=1 indicates the ii-th subject is a heterozygote, and xi=2x_i=2 indicates the ii-th subject is a mutation homozygote.

We would like to test the following hypotheses:

H0:β1=0,H_0: \beta_1=0,

and

H1:β1=δ,H_1: \beta_1 = \delta,

where δ0\delta\neq 0.

For a given SNP, we assume Hardy-Weinberg Equilibrium and denote the minor allele frequency of the SNP as θ\theta.

We can derive the power calculation formula is

power=1Φ(zα/2a×b)+Φ(zα/2a×b),power=1- \Phi\left(z_{\alpha^{*}/2}-a\times b\right) +\Phi\left(-z_{\alpha^{*}/2} - a\times b\right),

where

a=2θ(1θ)σya= \frac{\sqrt{2\theta\left(1-\theta\right)}}{\sigma_y}

and

b=δm(n1)1+(m1)ρ b=\frac{\delta\sqrt{m(n-1)}}{\sqrt{1+(m-1)\rho}}

and zα/2z_{\alpha^{*}/2} is the upper 100α/2100\alpha^{*}/2 percentile of the standard normal distribution, α=FWER/nTests\alpha^{*}=FWER/nTests, nTests is the number of (SNP, gene) pairs, σy=σβ2+σ2\sigma_y=\sqrt{\sigma^2_{\beta}+\sigma^2}, and ρ=σβ2/(σβ2+σ2)\rho=\sigma^2_{\beta}/\left(\sigma^2_{\beta}+\sigma^2\right) is the intra-class correlation.

Value

power if the input parameter power = NULL.

sample size (total number of subjects) if the input parameter n = NULL;

minimum detectable slope if the input parameter slope = NULL;

minimum allowable MAF if the input parameter MAF = NULL.

Author(s)

Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>

References

Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385

Examples

  n = 102
  m = 227868
  
  # calculate power
  power = powerEQTL.scRNAseq(
    slope = 0.6, 
    n = n, 
    m = m,
    power = NULL,
    sigma.y = 0.29, 
    MAF = 0.05, 
    rho = 0.8, 
    nTests = 1e+6)

  print(power)
  
  # calculate sample size (total number of subjects)
  n = powerEQTL.scRNAseq(
    slope = 0.6, 
    n = NULL, 
    m = m,
    power = 0.9567288,
    sigma.y = 0.29, 
    MAF = 0.05, 
    rho = 0.8, 
    nTests = 1e+6)

  print(n)

  # calculate slope
  slope = powerEQTL.scRNAseq(
    slope = NULL, 
    n = n, 
    m = m,
    power = 0.9567288,
    sigma.y = 0.29, 
    MAF = 0.05, 
    rho = 0.8, 
    nTests = 1e+6)

  print(slope)
  
  # calculate MAF
  MAF = powerEQTL.scRNAseq(
    slope = 0.6, 
    n = n, 
    m = m,
    power = 0.9567288,
    sigma.y = 0.29, 
    MAF = NULL, 
    rho = 0.8, 
    nTests = 1e+6)
  print(MAF)


[Package powerEQTL version 0.3.4 Index]