fitGP {SIBERG} | R Documentation |
Fit Generalized Poisson Mixture Model
Description
The function fits a two-component Generalized Poisson mixture model.
Usage
fitGP(y, d=NULL, inits=NULL, model='V', zeroPercentThr=0.2)
Arguments
y |
A vector representing the RNAseq raw count. |
d |
A vector of the same length as y representing the normalization constant to be applied to the data. |
inits |
Initial value to fit the mixture model. A vector with elements mu1, mu2, phi1, phi2 and pi1. |
model |
Character specifying E or V model. E model fits the mixture model with equal dispersion phi while V model doesn't put any constraint. |
zeroPercentThr |
A scalar specifying the minimum percent of zero counts needed when fitting a zero-inflated Generalized Poisson model. This parameter is used to deal with zero-inflation in RNAseq count data. When the percent of zero exceeds this threshold, rather than fitting a 2-component Generalized Poisson mixture, a mixture of point mass at 0 and Generalized Poisson is fitted. |
Details
This function directly maximize the log likelihood function through optimization. With this function, three models can be fitted: (1) Generalized Poisson mixture with equal dispersion (E model); (2) Generalized Poisson mixture with unequal dispersion (V model); (3) 0-inflated Generalized Poisson model. The 0-inflated Generalized Poisson has the following density function:
P(Y=y)=\pi D(y) + (1-\pi)GP(\mu, \phi)
where D is the point mass at 0 while GP(\mu, \phi)
is the density
of Generalized Poisson distribution with mean \mu
and dispersion \phi
. The variance is \phi \mu
.
The rule to fit 0-inflated model is that the observed percentage of count exceeds the user specified threshold. This rule overrides the model argument when observed percentae of zero count exceeds the threshold.
Value
A vector consisting parameter estimates of mu1, mu2, phi1, phi2, pi1, logLik and BIC. For 0-inflated model, mu1=phi1=0.
Author(s)
Pan Tong (nickytong@gmail.com), Kevin R Coombes (krc@silicovore.com)
References
Tong, P., Chen, Y., Su, X. and Coombes, K. R. (2012). Systematic Identification of Bimodally Expressed Genes Using RNAseq Data. Bioinformatics, 2013 Mar 1;29(5):605-13.
See Also
Examples
# artificial RNAseq data from negative binomial distribution
set.seed(1000)
dat <- rnbinom(100, mu=1000, size=1/0.2)
fitGP(y=dat)