R: Sample Size Calculations for Two-Sample RNA-seq Experiments...

ssizeRNA_vary {ssizeRNA}

R Documentation

Sample Size Calculations for Two-Sample RNA-seq Experiments with Differing Mean and Dispersion Among Genes

Description

This function calculates appropriate sample sizes for two-sample RNA-seq experiments for a desired power in which mean and dispersion vary among genes. Sample size calculations are performed at controlled false discovery rates, user-specified proportions of non-differentially expressed genes, mean counts in control group, dispersion, and fold change. A plot of power versus sample size is generated.

Usage

ssizeRNA_vary(nGenes = 10000, pi0 = 0.8, m = 200, mu, disp, fc,
  up = 0.5, replace = TRUE, fdr = 0.05, power = 0.8, maxN = 35,
  side = "two-sided", cex.title = 1.15, cex.legend = 1)

Arguments

`nGenes`	total number of genes, the default value is `10000`.
`pi0`	proportion of non-differentially expressed genes, the default value is `0.8`.
`m`	pseudo sample size for generated data.
`mu`	a vector (or scalar) of mean counts in control group from which to simulate.
`disp`	a vector (or scalar) of dispersion parameter from which to simulate.
`fc`	a vector (or scalar, or a function that takes an integer n and generates a vector of length n) of fold change for differentially expressed (DE) genes.
`up`	proportion of up-regulated genes among all DE genes, the default value is `0.5`.
`replace`	sample with or without replacement from given parameters. See Details for more information.
`fdr`	the false discovery rate to be controlled.
`power`	the desired power to be achieved.
`maxN`	the maximum sample size used for power calculations.
`side`	options are "two-sided", "upper", or "lower".
`cex.title`	controls size of chart titles.
`cex.legend`	controls size of chart legend.

Details

If a vector is input for pi0, sample size calculations are performed for each proportion.

If the total number of genes is larger than length of mu or disp, replace always equals TRUE.

Value

`ssize`	sample sizes (for each treatment) at which desired power is first reached.
`power`	power calculations with corresponding sample sizes.
`crit.vals`	critical value calculations with corresponding sample sizes.

Author(s)

Ran Bi biranpier@gmail.com, Peng Liu pliu@iastate.edu

References

Liu, P. and Hwang, J. T. G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6): 739-746.

Orr, M. and Liu, P. (2009) Sample size estimation while controlling false discovery rate for microarray experiments using ssize.fdr package. The R Journal, 1, 1, May 2009, 47-53.

Law, C. W., Chen, Y., Shi, W., Smyth, G. K. (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.

Examples

library(edgeR)
library(Biobase)
data(hammer.eset)
## load hammer dataset (Hammer, P. et al., 2010)

counts <- exprs(hammer.eset)[, phenoData(hammer.eset)$Time == "2 weeks"]
counts <- counts[rowSums(counts) > 0,]
trt <- hammer.eset$protocol[which(hammer.eset$Time == "2 weeks")] 

mu <- apply(counts[, trt == "control"], 1, mean)  
## average read count in control group for each gene

d <- DGEList(counts)
d <- calcNormFactors(d)
d <- estimateCommonDisp(d)
d <- estimateTagwiseDisp(d)
disp <- d$tagwise.dispersion      
## dispersion for each gene

## fixed fold change
fc <- 2
size <- ssizeRNA_vary(mu = mu, disp = disp, fc = fc, 
                      m = 30, maxN = 15, replace = FALSE)
size$ssize         ## first sample size to reach desired power
size$power         ## calculated power for each sample size
size$crit.vals     ## calculated critical value for each sample size


## varying fold change
## fc1 <- function(x){exp(rnorm(x, log(2), 0.5*log(2)))}
## size1 <- ssizeRNA_vary(pi0 = 0.8, mu = mu, disp = disp, fc = fc1, 
##                        m = 30, maxN = 20, replace = FALSE)

[Package ssizeRNA version 1.3.2 Index]