BinaryEPPM {BinaryEPPM}R Documentation

Fitting of EPPM models to binary data.


Fits regression models to under- and over-dispersed binary data using extended Poisson process models.


BinaryEPPM(formula, data, subset = NULL, na.action = NULL, 
       weights = NULL, model.type = "p and scale-factor", = "generalized binomial", link = "cloglog", 
       initial = NULL, method = "Nelder-Mead", 
       pseudo.r.squared.type = "square of correlation", control = NULL)



Formulae for the probability of a success p and scale-factor. The object used is from the package Formula of Zeileis and Croissant (2010) which allows multiple parts and multiple responses. "formula" should consist of a left hand side (lhs) of single response variable and a right hand side (rhs) of one or two sets of variables for the linear predictors for the mean and (if two sets) the variance. This is as used for the R function "glm" and also, for example, as for the package "betareg" (Cribari-Neto and Zeileis, 2010). The function identifies from the argument data whether a data frame (as for use of "glm") or a list has been input. The list should be exactly the same as for a data frame except that the response variable is a list of vectors of frequency distributions rather than two vectors of paired counts of number responding (r) out of number tested as for the data frame. The subordinate functions fit models where the response variables are "p.obs", or "scalef.obs" according to the model type being fitted. The values for these response variables are not input as part of "data", they are calculated within the function from a list of grouped binary data input. If the "model.type" is "p only", "formula" consists of a lhs of the response variable and a rhs of the terms of the linear predictor for the mean model. If the "model.type" is "p and scale-factor" there are two sets of terms in the rhs of "formula" i.e., "p.obs" and "scalef.obs" together with the two sets of terms for the linear predictors of p and scale-factor.


"data" should be either a data frame (as for use of "glm") or a list. The list should be exactly the same as for a data frame except that the response variable is a list of vectors of frequency distributions rather than a vector of single counts as for the data frame. Only one list is allowed within "data" as it is identified as the dependent variable. If other lists are in "data", for example for use as weights, they should be removed from "data" prior to calling this function. The extracted list can be called using the "weights" argument to this function. Within the function a working list "listcounts" and data frames with components such as "p.obs", "scalef.obs", "covariates", "offset.mean", "offset.variance" are set up . The component "covariates" is a data frame of vectors of covariates in the model. The component "listcounts" is a list of vectors of frequency distributions, or the single pairs of r/n in grouped form if "data" is a data frame.


Subsetting commands.


Action taken for NAs in data.


Vector of list of lists of weights.


Takes one of two values i.e. "p only" or "p and scale-factor". The "p only" value fits a linear predictor function to the parameter a in equation (3) of Faddy and Smith (2012). If the model type being fitted is binomial, modeling a is the same as modeling the mean. For the negative binomial the mean is b exp(a)-1), b also being as in equation (3) of Faddy and Smith (2012). The "p and scale-factor" value fits linear predictor functions to both the probability of a success p and the scale-factor.

If model.type is "p only" the model being fitted is one of the four "binomial", "Faddy distribution", "beta binomial", "correlated binomial". If model.type is "p and scale-factor" the model being fitted is either "general" i.e. as equations (4) and (6) of Faddy and Smith (2012) or one of the two "beta binomial", "correlated binomial".


Takes one of nine values i.e., 'logit', 'probit', 'cloglog', 'cauchit', 'log', 'loglog', 'double exponential', 'double reciprocal', 'power logit'. The default is 'cloglog'. The 'power logit' has an attribute of 'power' for which the default is 1 i.e., a logit link.


This is a vector of initial values for the parameters. If this vector is NULL then initial values based on a fitting binomial models using "glm" are calculated within the function.


Takes one of the two values "Nelder-Mead" or "BFGS" these being arguments of optim.


Takes one of the three values "square of correlation", "R square" or "max-rescaled R square". The "default" is as used in Cribari-Neto and Zeileis (2010) and is the square of the correlation between the observed and predicted values on the GLM linear predictor scale. The other two are as described in Cox and Snell (1989), and Nagelkerke (1991) and apply to logistic regression.


"control" is a list of control parameters as used in "optim". If this list is NULL the defaults for "optim" are set as "control <- list(fnscale=-1, trace=0, maxit=1000)". The control parameters that can be changed by inputting a variable length list are "fnscale, trace, maxit, abstol, reltol, alpha, beta, gamma". Details of "optim" and its control parameters are available in the online R help manuals.



The type of the data i.e., data frame or list

Data as a list of lists of frequency distributions


The call of the function


The formula argument


The type of model being fitted

The model being fitted


The link function


The design matrix for the probability of a success


The design matrix for the scalefactor


The offset vector for the probability of a success


The offset vector for the scalefactor


Estimates of model parameters




The variance/covariance matrix


The number of observations


The number of observations


The degrees of freedom of the null model


The degrees of freedom of the residual


Vector of maximums of grouped count data vectors in list.counts


Vector or list of weights


Whether the iterative process converged, TRUE or FALSE


Number of iterations taken


Method for optim either Nelder-Mead or BFGS


Pseudo R**2 value


Starting values for iterative process


Estimates of model parameters


Control parameters for optim


Fitted values for probability of success


Dependent variable


Terms in model fitted


David M. Smith <>


Cox DR, Snell EJ. (1989). Analysis of Binary Data. Second Edition. Chapman \& Hall.

Cribari-Neto F, Zeileis A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1-24. doi: 10.18637/jss.v034.i02.

Grun B, Kosmidis I, Zeileis A. (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1-25. doi: 10.18637/jss.v048.i11.

Faddy M, Smith D. (2012). Extended Poisson Process Modeling and Analysis of Grouped Binary Data. Biometrical Journal, 54, 426-435. doi: 10.1002/bimj.201100214.

Nagelkerke NJD. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78, 691-692.

Smith D, Faddy M. (2019). Mean and Variance Modeling of Under-Dispersed and Over-Dispersed Grouped Binary Data. Journal of Statistical Software, 90(8), 1-20. doi: 10.18637/jss.v090.i08.

Zeileis A, Croissant Y. (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34(1), 1-13. doi: 10.18637/jss.v034.i01.

See Also

CountsEPPM betareg


output.fn <- BinaryEPPM(data =,
                  number.spores / number.tested ~ 1 + offset(logdilution),
                  model.type = "p only", = "binomial")   

[Package BinaryEPPM version 2.3 Index]