R: Estimation of the proportion of true null hypotheses

estim.pi0 {cp4p}

R Documentation

Estimation of the proportion of true null hypotheses

Description

From a proteomics viewpoint, this function estimates the global proportion of proteins (resp. of peptides) that are non differentially abundant from the tested protein list (resp. from the tested peptide list). This proportion is later used as a correcting factor to compute the adjusted p-values, that are in turn used to tune a threshold according to a desired false discovery rate.

From a statistical viewpoint, this function allows estimating the proportion of true null hypotheses (pi0) from a vector of raw p-values following eight different estimation methods from the literature.

Usage

estim.pi0(p, pi0.method = "ALL", nbins = 20, pz = 0.05)

Arguments

`p`	Numeric vector of raw p-values. Raw p-values are assumed without missing values, and between 0 and 1.
`pi0.method`	Name of an estimation method for the proportion of true null hypotheses among `"st.boot"`, `"st.spline"`, `"langaas"`, `"jiang"`, `"histo"`, `"pounds"`, `"abh"` or `"slim"`. Default is `"ALL"`: all the eight estimation methods are performed simultaneously.
`nbins`	Number of bins. Parameter used for the `"jiang"` and `"histo"` methods. Default is 20.
`pz`	P-value threshold such as p-values below are associated to false null hypotheses. Used for the `"slim"` method. Wang, Tuominen and Tsai (2011) suggest to take a value between 0.01 and 0.1. Default is 0.05.

Details

This function allows to estimate the proportion of true null hypotheses following different estimation methods :

`"abh"`	the least slope method proposed in Benjamini and Hochberg (2000).

`"st.spline"`	the smoother method described in Storey and Tibshirani (2003).
	The `qvalue` function of R package `qvalue` with default tuning is used (Storey (2015)).

`"st.boot"`	the bootstrap method described in Storey et al. (2004).
	The `qvalue` function of R package `qvalue` with default tuning is used (Storey (2015)).

`"langaas"`	the method described in Langaas, Ferkingstad and Lindqvist (2005) using a convex
	decreasing density estimate for p-values. The `convest` function of R package `limma`
	with default tuning is used (Ritchie et al. (2015)).

`"histo"`	the histogram method described in Nettleton, Hwang, Caldo and Wise (2006).

`"pounds"`	the conservative estimate described in Pounds and Cheng (2006).

`"jiang"`	the average estimate method described in Jiang and Doerge (2008).

`"slim"`	the method of Wang, Tuominen and Tsai (2011) using a sliding linear model.
	The default tuning suggested by Wang, Tuominen and Tsai (2011) is used.
	Using their notations, lambda1 is fixed to 0.1, n to 10 and B to 100.

To take into account of right censorship on the vector of p-values, each p-value is divided by the maximum p-value present in p. Accordingly, the p-values of the true null hypotheses are assumed uniformly distributed between 0 and this maximum. This kind of censorship happens in proteomics when a first thresholding is performed on the fold-changes.

If you want to assume that the p-values are uniformly distributed between 0 and 1, replace p by c(p,1) when using estim.pi0.

Value

pi0

Numeric value of the estimated proportion of true null hypotheses from the selected method; Numeric vector if pi0.method="ALL".

Author(s)

Quentin Giai Gianetto <quentin2g@yahoo.fr>

References

Y. Benjamini and Y. Hochberg. On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1):60-83, 2000.

H. Jiang and R.W. Doerge. Estimating the proportion of true null hypotheses for multiple comparisons. Cancer informatics, 6:25, 2008.

M. Langaas, B.H. Lindqvist, and E. Ferkingstad. Estimating the proportion of true null hypotheses, with application to dna microarray data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(4):555-572, 2005.

D. Nettleton, J.T.G. Hwang, R.A. Caldo, and R.P. Wise. Estimating the number of true null hypotheses from a histogram of p values. Journal of Agricultural, Biological, and Environmental Statistics, 11(3):337-356, 2006.

S. Pounds and C. Cheng. Robust estimation of the false discovery rate. Bioinformatics, 22(16):1979-1987, 2006.

M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi and G.K. Smyth. “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), pp.e47. 2015.

J.D. Storey, J.E. Taylor, and D. Siegmund. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1):187-205, 2004.

J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):9440-9445, 2003.

J.D. Storey. qvalue: Q-value estimation for false discovery rate control. R package version 2.0.0, http://qvalue.princeton.edu/, http://github.com/jdstorey/qvalue. 2015.

H.-Q. Wang, L.K. Tuominen, and C.-J. Tsai. SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics, 27(2):225-231, 2011.

Examples

#get p-values
data(LFQRatio2)
p=LFQRatio2[,7]

#estimate the proportion of true null hypotheses with different methods
r=estim.pi0(p)
r$pi0

#estimate the proportion of true null hypotheses with the "abh" method
r=estim.pi0(p, pi0.method="abh")
r$pi0

#compare with one minus the proportion of human proteins 
prop_human=sum(LFQRatio2$Organism=="human")/length(LFQRatio2$Organism)
pi0_true=1-prop_human
pi0_true

[Package cp4p version 0.3.6 Index]