estim.pi0 {cp4p}  R Documentation 
From a proteomics viewpoint, this function estimates the global proportion of proteins (resp. of peptides) that are non differentially abundant from the tested protein list (resp. from the tested peptide list). This proportion is later used as a correcting factor to compute the adjusted pvalues, that are in turn used to tune a threshold according to a desired false discovery rate.
From a statistical viewpoint, this function allows estimating the proportion of true null hypotheses (pi0) from a vector of raw pvalues following eight different estimation methods from the literature.
estim.pi0(p, pi0.method = "ALL", nbins = 20, pz = 0.05)
p 
Numeric vector of raw pvalues. Raw pvalues are assumed without missing values, and between 0 and 1. 
pi0.method 
Name of an estimation method for the proportion of true null hypotheses among 
nbins 
Number of bins. Parameter used for the 
pz 
Pvalue threshold such as pvalues below are associated to false null hypotheses. Used for the 
This function allows to estimate the proportion of true null hypotheses following different estimation methods :
"abh"  the least slope method proposed in Benjamini and Hochberg (2000). 
"st.spline"  the smoother method described in Storey and Tibshirani (2003). 
The qvalue function of R package qvalue with default tuning is used (Storey (2015)). 

"st.boot"  the bootstrap method described in Storey et al. (2004). 
The qvalue function of R package qvalue with default tuning is used (Storey (2015)). 

"langaas"  the method described in Langaas, Ferkingstad and Lindqvist (2005) using a convex 
decreasing density estimate for pvalues. The convest function of R package limma 

with default tuning is used (Ritchie et al. (2015)).  
"histo"  the histogram method described in Nettleton, Hwang, Caldo and Wise (2006). 
"pounds"  the conservative estimate described in Pounds and Cheng (2006). 
"jiang"  the average estimate method described in Jiang and Doerge (2008). 
"slim"  the method of Wang, Tuominen and Tsai (2011) using a sliding linear model. 
The default tuning suggested by Wang, Tuominen and Tsai (2011) is used.  
Using their notations, lambda1 is fixed to 0.1, n to 10 and B to 100. 
To take into account of right censorship on the vector of pvalues, each pvalue is divided by the maximum pvalue present in p
. Accordingly, the pvalues of the true null hypotheses are assumed uniformly distributed between 0 and this maximum. This kind of censorship happens in proteomics when a first thresholding is performed on the foldchanges.
If you want to assume that the pvalues are uniformly distributed between 0 and 1, replace p
by c(p,1)
when using estim.pi0
.
pi0 
Numeric value of the estimated proportion of true null hypotheses from the selected method; Numeric vector if 
Quentin Giai Gianetto <quentin2g@yahoo.fr>
Y. Benjamini and Y. Hochberg. On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1):6083, 2000.
H. Jiang and R.W. Doerge. Estimating the proportion of true null hypotheses for multiple comparisons. Cancer informatics, 6:25, 2008.
M. Langaas, B.H. Lindqvist, and E. Ferkingstad. Estimating the proportion of true null hypotheses, with application to dna microarray data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(4):555572, 2005.
D. Nettleton, J.T.G. Hwang, R.A. Caldo, and R.P. Wise. Estimating the number of true null hypotheses from a histogram of p values. Journal of Agricultural, Biological, and Environmental Statistics, 11(3):337356, 2006.
S. Pounds and C. Cheng. Robust estimation of the false discovery rate. Bioinformatics, 22(16):19791987, 2006.
M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi and G.K. Smyth. “limma powers differential expression analyses for RNAsequencing and microarray studies.” Nucleic Acids Research, 43(7), pp.e47. 2015.
J.D. Storey, J.E. Taylor, and D. Siegmund. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66(1):187205, 2004.
J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16):94409445, 2003.
J.D. Storey. qvalue: Qvalue estimation for false discovery rate control. R package version 2.0.0, http://qvalue.princeton.edu/, http://github.com/jdstorey/qvalue. 2015.
H.Q. Wang, L.K. Tuominen, and C.J. Tsai. SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures. Bioinformatics, 27(2):225231, 2011.
#get pvalues
data(LFQRatio2)
p=LFQRatio2[,7]
#estimate the proportion of true null hypotheses with different methods
r=estim.pi0(p)
r$pi0
#estimate the proportion of true null hypotheses with the "abh" method
r=estim.pi0(p, pi0.method="abh")
r$pi0
#compare with one minus the proportion of human proteins
prop_human=sum(LFQRatio2$Organism=="human")/length(LFQRatio2$Organism)
pi0_true=1prop_human
pi0_true