R: Fitting of EPPM models to binary data.

BinaryEPPM {BinaryEPPM}

R Documentation

Fitting of EPPM models to binary data.

Description

Fits regression models to under- and over-dispersed binary data using extended Poisson process models.

Usage

BinaryEPPM(formula, data, subset = NULL, na.action = NULL, 
       weights = NULL, model.type = "p only", 
       model.name = "EPPM extended binomial", link = "cloglog", 
       initial = NULL, method = "Nelder-Mead", 
       pseudo.r.squared.type = "square of correlation", control = NULL)

Arguments

`formula`	Formulae for the probability of a success p and scale-factor. The object used is from the package `Formula` of Zeileis and Croissant (2010) which allows multiple parts and multiple responses. "formula" should consist of a left hand side (lhs) of single response variable and a right hand side (rhs) of one or two sets of variables for the linear predictors for the mean and (if two sets) the variance. This is as used for the R function "glm" and also, for example, as for the package "betareg" (Cribari-Neto and Zeileis, 2010). The function identifies from the argument data whether a data frame (as for use of "glm") or a list has been input. The list should be exactly the same as for a data frame except that the response variable is a list of vectors of frequency distributions rather than two vectors of paired counts of number responding (r) out of number tested as for the data frame. The subordinate functions fit models where the response variables are "p.obs", or "scalef.obs" according to the model type being fitted. The values for these response variables are not input as part of "data", they are calculated within the function from a list of grouped binary data input. If the "model.type" is "p only", "formula" consists of a lhs of the response variable and a rhs of the terms of the linear predictor for the mean model. If the "model.type" is "p and scale-factor" there are two sets of terms in the rhs of "formula" i.e., "p.obs" and "scalef.obs" together with the two sets of terms for the linear predictors of p and scale-factor.
`data`	"data" should be either a data frame (as for use of "glm") or a list. The list should be exactly the same as for a data frame except that the response variable is a list of vectors of frequency distributions rather than a vector of single counts as for the data frame. Only one list is allowed within "data" as it is identified as the dependent variable. If other lists are in "data", for example for use as weights, they should be removed from "data" prior to calling this function. The extracted list can be called using the "weights" argument to this function. Within the function a working list "listcounts" and data frames with components such as "p.obs", "scalef.obs", "covariates", "offset.mean", "offset.variance" are set up . The component "covariates" is a data frame of vectors of covariates in the model. The component "listcounts" is a list of vectors of frequency distributions, or the single pairs of r/n in grouped form if "data" is a data frame.
`subset`	Subsetting commands.
`na.action`	Action taken for NAs in data.
`weights`	Vector of list of lists of weights.
`model.type`	Takes one of two values i.e. "p only" or "p and scale-factor". The "p only" value fits a linear predictor function to the parameter a in equation (3) of Faddy and Smith (2012). If the model type being fitted is binomial, modeling a is the same as modeling the mean. For the negative binomial the mean is b exp(a)-1), b also being as in equation (3) of Faddy and Smith (2012). The "p and scale-factor" value fits linear predictor functions to both the probability of a success p and the scale-factor.
`model.name`	If model.type is "p only" the model being fitted is one of the four "binomial", "EPPM extended binomial", "beta binomial", "correlated binomial". If model.type is "p and scale-factor" the model being fitted is either "EPPM extended binomial" i.e. as equations (4) and (6) of Faddy and Smith (2012) or one of the two "beta binomial", "correlated binomial".
`link`	Takes one of nine values i.e., 'logit', 'probit', 'cloglog', 'cauchit', 'log', 'loglog', 'double exponential', 'double reciprocal', 'power logit'. The default is 'cloglog'. The 'power logit' has an attribute of 'power' for which the default is 1 i.e., a logit link.
`initial`	This is a vector of initial values for the parameters. If this vector is NULL then initial values based on a fitting binomial models using "glm" are calculated within the function.
`method`	Takes one of the two values "Nelder-Mead" or "BFGS" these being arguments of `optim`.
`pseudo.r.squared.type`	Takes one of the three values "square of correlation", "R square" or "max-rescaled R square". The "default" is as used in Cribari-Neto and Zeileis (2010) and is the square of the correlation between the observed and predicted values on the GLM linear predictor scale. The other two are as described in Cox and Snell (1989), and Nagelkerke (1991) and apply to logistic regression.
`control`	"control" is a list of control parameters as used in "optim". If this list is NULL the defaults for "optim" are set as "control <- list(fnscale=-1, trace=0, maxit=1000)". The control parameters that can be changed by inputting a variable length list are "fnscale, trace, maxit, abstol, reltol, alpha, beta, gamma". Details of "optim" and its control parameters are available in the online R help manuals.

Value

An object of class "BinaryEPMM" is returned. A list of object items follows.

`data.type`	The type of the data i.e., data frame or list
`list.data`	Data as a list of lists of frequency distributions
`call`	The call of the function
`formula`	The formula argument
`model.type`	The type of model being fitted
`model.name`	The model being fitted
`link`	The link function
`covariates.matrix.p`	The design matrix for the probability of a success
`covariates.matrix.scalef`	The design matrix for the scalefactor
`offset.p`	The offset vector for the probability of a success
`offset.scalef`	The offset vector for the scalefactor
`coefficients`	Estimates of model parameters
`loglikelihood`	Loglikelihood
`vcov`	The variance/covariance matrix
`n`	The number of observations
`nobs`	The number of observations
`df.null`	The degrees of freedom of the null model
`df.residual`	The degrees of freedom of the residual
`vnmax`	Vector of maximums of grouped count data vectors in list.counts
`weights`	Vector or list of weights
`converged`	Whether the iterative process converged, TRUE or FALSE
`iterations`	Number of iterations taken
`method`	Method for optim either Nelder-Mead or BFGS
`pseudo.r.squared`	Pseudo R**2 value
`start`	Starting values for iterative process
`optim`	Estimates of model parameters
`control`	Control parameters for `optim`
`fitted.values`	Fitted values for probability of success
`y`	Dependent variable
`terms`	Terms in model fitted

Author(s)

David M. Smith <dmccsmith@verizon.net>

References

Cox DR, Snell EJ. (1989). Analysis of Binary Data. Second Edition. Chapman & Hall.

Cribari-Neto F, Zeileis A. (2010). Beta Regression in R. Journal of Statistical Software, 34(2), 1-24. doi:10.18637/jss.v034.i02.

Grun B, Kosmidis I, Zeileis A. (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, 48(11), 1-25. doi:10.18637/jss.v048.i11.

Faddy M, Smith D. (2012). Extended Poisson Process Modeling and Analysis of Grouped Binary Data. Biometrical Journal, 54, 426-435. doi:10.1002/bimj.201100214.

Nagelkerke NJD. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78, 691-692.

Smith D, Faddy M. (2019). Mean and Variance Modeling of Under-Dispersed and Over-Dispersed Grouped Binary Data. Journal of Statistical Software, 90(8), 1-20. doi:10.18637/jss.v090.i08.

Zeileis A, Croissant Y. (2010). Extended Model Formulas in R: Multiple Parts and Multiple Responses. Journal of Statistical Software, 34(1), 1-13. doi:10.18637/jss.v034.i01.

Examples

data("ropespores.case") 
output.fn <- BinaryEPPM(data = ropespores.case,
                  number.spores / number.tested ~ 1 + offset(logdilution),
                  model.type = "p only", model.name = "binomial")   
summary(output.fn)

[Package BinaryEPPM version 3.0 Index]