R: P-value combination using the inverse normal method

invnorm {metaRNASeq}

R Documentation

P-value combination using the inverse normal method

Description

Combines one sided p-values using the inverse normal method.

Usage

invnorm(indpval, nrep, BHth = 0.05)

Arguments

`indpval`	List of vectors of one sided p-values to be combined.
`nrep`	Vector of numbers of replicates used in each study to calculate the previous one-sided p-values.
`BHth`	Benjamini Hochberg threshold. By default, the False Discovery Rate is controlled at 5%.

Details

For each gene g, let

N_g = \sum_{s=1}^S \omega_s \Phi^{-1}(1-p_{gs}),

where p_{gs} corresponds to the raw p-value obtained for gene g in a differential analysis for study s (assumed to be uniformly distributed under the null hypothesis), \Phi the cumulative distribution function of the standard normal distribution, and \omega_s a set of weights. We define the weights \omega_s as in Marot and Mayer (2009):

\omega_s = \sqrt{\frac{\sum_c R_{cs}}{\sum_\ell \sum_c R_{c\ell}}},

where \sum_c R_{cs} is the total number of biological replicates in study s. This allows studies with large numbers of biological replicates to be attributed a larger weight than smaller studies.

Under the null hypothesis, the test statistic N_g follows a N(0,1) distribution. A unilateral test on the righthand tail of the distribution may then be performed, and classical procedures for the correction of multiple testing, such as that of Benjamini and Hochberg (1995), may subsequently be applied to the obtained p-values to control the false discovery rate at a desired level \alpha.

Value

`DEindices`	Indices of differentially expressed genes at the chosen Benjamini Hochberg threshold.
`TestStatistic`	Vector with test statistics for differential expression in the meta-analysis.
`rawpval`	Vector with raw p-values for differential expression in the meta-analysis.
`adjpval`	Vector with adjusted p-values for differential expression in the meta-analysis.

Note

This function resembles the function directpvalcombi in the metaMA R package; there is, however, one important difference in the calculation of p-values. In particular, for microarray data, it is typically advised to separate under- and over-expressed genes prior to the meta-analysis. In the case of RNA-seq data, differential analyses from individual studies typically make use of negative binomial models and exact tests, which lead to one-sided, rather than two-sided, p-values. As such, we suggest performing a meta-analysis over the full set of genes, followed by an a posteriori check, and if necessary filter, of genes with conflicting results (over vs. under expression) among studies.

References

Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: a pratical and powerful approach to multiple testing. JRSS B (57): 289-300.

Hedges, L. and Olkin, I. (1985). Statistical Methods for Meta-Analysis. Academic Press.

Marot, G. and Mayer, C.-D. (2009). Sequential analysis for microarray data based on sensitivity and meta-analysis. SAGMB 8(1): 1-33.

A. Rau, G. Marot and F. Jaffrezic (2014). Differential meta-analysis of RNA-seq data. BMC Bioinformatics 15:91

Examples

data(rawpval)
## 8 replicates simulated in each study
invnormcomb <- invnorm(rawpval,nrep=c(8,8), BHth = 0.05)       
DE <- ifelse(invnormcomb$adjpval<=0.05,1,0)
hist(invnormcomb$rawpval,nclass=100)

## A more detailed example is given in the vignette of the package:
## vignette("metaRNASeq")

[Package metaRNASeq version 1.0.7 Index]