zeroalt {glmtoolbox} | R Documentation |
Zero-Altered Regression Models to deal with Zero-Excess in Count Data
Description
Allows to fit a zero-altered (Poisson or negative binomial) regression model to deal with zero-excess in count data.
Usage
zeroalt(
formula,
data,
offset,
subset,
na.action = na.omit(),
weights,
family = "poi(log)",
zero.link = c("logit", "probit", "cloglog", "cauchit", "log"),
reltol = 1e-13,
start = list(counts = NULL, zeros = NULL),
...
)
Arguments
formula |
a |
data |
an (optional) |
offset |
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be |
subset |
an (optional) vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data contain NAs. By default |
weights |
an (optional) vector of positive "prior weights" to be used in the fitting process. The length of
|
family |
an (optional) character string that allows you to specify the distribution
to describe the response variable, as well as the link function to be used in
the model for |
zero.link |
an (optional) character string which allows to specify the link function to be used in the model for |
reltol |
an (optional) positive value which represents the relative convergence tolerance for the BFGS method in optim.
As default, |
start |
an (optional) list with two components named "counts" and "zeros", which allows to specify the starting values to be used in the
iterative process to obtain the estimates of the parameters in the linear predictors of the models for |
... |
further arguments passed to or from other methods. |
Details
The zero-altered count distributions, also called hurdle models, may be obtained as the mixture between
a zero-truncated count distribution and the Bernoulli distribution. Indeed, if Y
is a count random variable
such that Y|\nu=1
is 0 with probability 1
and Y|\nu=0
~ ZTP(\mu)
, where \nu
~ Bernoulli(\pi)
, then
Y
is distributed according to the Zero-Altered Poisson distribution, denoted here as
ZAP(\mu,\pi)
.
Similarly, if Y
is a count random variable such that Y|\nu=1
is 0 with probability 1
and Y|\nu=0
~ ZTNB(\mu,\phi,\tau)
, where \nu
~ Bernoulli(\pi)
, then
Y
is distributed according to the Zero-Altered Negative Binomial distribution, denoted here as
ZANB(\mu,\phi,\tau,\pi)
. The Zero-Altered Negative Binomial I (\mu,\phi,\pi)
and
Zero-Altered Negative Binomial II (\mu,\phi,\pi)
distributions are special cases of ZANB when
\tau=0
and \tau=-1
, respectively.
The "counts" model may be expressed as g(\mu_i)=x_i^{\top}\beta
for i=1,\ldots,n
, where
g(\cdot)
is the link function specified at the argument family
. Similarly, the "zeros" model may
be expressed as h(\pi_i)=z_i^{\top}\gamma
for i=1,\ldots,n
, where h(\cdot)
is the
link function specified at the argument zero.link
. Parameter estimation is
performed using the maximum likelihood method. The parameter vector \gamma
is
estimated by applying the routine glm.fit, where a binary-response model
(1
or "success" if response
=0 and 0
or "fail" if response
>0)
is fitted. Then, the rest of the model parameters are estimated by maximizing the
log-likelihood function based on the zero-truncated count distribution through the
BFGS method available in the routine optim. The accuracy and speed of the BFGS
method are increased because the call to the routine optim is performed using
the analytical instead of the numerical derivatives. The variance-covariance matrix
estimate is obtained as being minus the inverse of the (analytical) hessian matrix
evaluated at the parameter estimates and the observed data.
A set of standard extractor functions for fitted model objects is available for objects
of class zeroinflation, including methods to the generic functions such as
print, summary, model.matrix, estequa,
coef, vcov, logLik, fitted, confint, AIC, BIC and
predict. In addition, the model fitted to the data may be assessed using functions such as
anova.zeroinflation, residuals.zeroinflation, dfbeta.zeroinflation,
cooks.distance.zeroinflation and envelope.zeroinflation.
Value
An object of class zeroinflation in which the main results of the model fitted to the data are stored, i.e., a list with components including
coefficients | a list with elements "counts" and "zeros" containing the parameter estimates |
from the respective models, | |
fitted.values | a list with elements "counts" and "zeros" containing the estimates of \mu_1,\ldots,\mu_n |
and \pi_1,\ldots,\pi_n , respectively, |
|
start | a vector containing the starting values for all parameters in the model, |
prior.weights | a vector containing the case weights used, |
offset | a list with elements "counts" and "zeros" containing the offset vectors, if any, |
from the respective models, | |
terms | a list with elements "counts", "zeros" and "full" containing the terms objects for |
the respective models, | |
loglik | the value of the log-likelihood function avaliated at the parameter estimates and |
the observed data, | |
estfun | a list with elements "counts" and "zeros" containing the estimating functions |
evaluated at the parameter estimates and the observed data for the respective models, | |
formula | the formula, |
levels | the levels of the categorical regressors, |
contrasts | a list with elements "counts" and "zeros" containing the contrasts corresponding |
to levels from the respective models, | |
converged | a logical indicating successful convergence, |
model | the full model frame, |
y | the response count vector, |
family | a list with elements "counts" and "zeros" containing the family objects used |
in the respective models, | |
linear.predictors | a list with elements "counts" and "zeros" containing the estimates of |
g(\mu_1),\ldots,g(\mu_n) and h(\pi_1),\ldots,h(\pi_n) , respectively, |
|
R | a matrix with the Cholesky decomposition of the inverse of the variance-covariance |
matrix of all parameters in the model, | |
call | the original function call. |
References
Cameron A.C., Trivedi P.K. (1998) Regression Analysis of Count Data. New York: Cambridge University Press.
Mullahy J. (1986) Specification and Testing of Some Modified Count Data Models. Journal of Econometrics 33, 341–365.
See Also
Examples
####### Example 1: Roots Produced by the Columnar Apple Cultivar Trajan
data(Trajan)
fit1 <- zeroalt(roots ~ photoperiod, family="nbf(log)", zero.link="logit", data=Trajan)
summary(fit1)
####### Example 2: Self diagnozed ear infections in swimmers
data(swimmers)
fit2 <- zeroalt(infections ~ frequency | location, family="nb1(log)", data=swimmers)
summary(fit2)
####### Example 3: Article production by graduate students in biochemistry PhD programs
bioChemists <- pscl::bioChemists
fit3 <- zeroalt(art ~ fem + kid5 + ment, family="nb1(log)", data = bioChemists)
summary(fit3)