mixpoissonregML {mixpoissonreg}R Documentation

Maximum Likelihood Mixed Poisson Regression Models for Overdispersed Count Data

Description

Uses maximum likelihood estimators to fit mixed Poisson regression models (Poisson-Inverse Gaussian or Negative-Binomial) on data sets with response variables being count data. The models can have varying precision parameter, where a linear regression structure (through a link function) is assumed to hold on the precision parameter.

Usage

mixpoissonregML(
  formula,
  data,
  link.mean = c("log", "sqrt"),
  link.precision = c("identity", "log", "inverse.sqrt"),
  model = c("NB", "PIG"),
  residual = c("pearson", "score"),
  y = TRUE,
  x = TRUE,
  w = TRUE,
  envelope = 0,
  prob = 0.95,
  model.frame = TRUE,
  em_controls = list(maxit = 5000, em_tol = 10^(-5), em_tolgrad = 10^(-2)),
  optim_method = "L-BFGS-B",
  optim_controls = list()
)

mixpoissonregML.fit(
  x,
  y,
  w = NULL,
  link.mean = c("log", "sqrt"),
  link.precision = c("identity", "log", "inverse.sqrt"),
  model = c("NB", "PIG"),
  residual = c("pearson", "score"),
  envelope = 0,
  prob = 0.95,
  em_controls = list(maxit = 5000, em_tol = 10^(-5), em_tolgrad = 10^(-2)),
  optim_method = "L-BFGS-B",
  optim_controls = list()
)

Arguments

formula

symbolic description of the model (examples: y ~ x1 + ... + xnbeta and y ~ x1 + ... + xnbeta | w1 + ... + wnalpha); see details below.

data

elements expressed in formula. This is usually a data frame composed by: (i) the observations formed by count data z, with z_i being non-negative integers, (ii) covariates for the mean submodel (columns x1, ..., xnbeta) and (iii) covariates for the precision submodel (columns w1, ..., wnalphla).

link.mean

optionally, a string containing the link function for the mean. If omitted, the 'log' link function will be used. The possible link functions for the mean are "log" and "sqrt".

link.precision

optionally, a string containing the link function the precision parameter. If omitted and the only precision covariate is the intercept, the 'identity' link function will be used, if omitted and there is a precision covariate other than the intercept, the 'log' link function will be used. The possible link functions for the precision parameter are "identity" and "inverse.sqrt" (which is \phi^{-1/2} = w_i^T alpha).

model

character ("NB" or "PIG") indicating the type of model to be fitted, with "NB" standing for Negative-Binomial and "PIG" standing for Poisson Inverse Gaussian. The default is "NB".

residual

character indicating the type of residual to be evaluated ("pearson" or "score"). The default is "pearson". Notice that they coincide for Negative-Binomial models.

y

For mixpoissonregML: logical values indicating if the response vector should be returned as component.

For mixpoissonregML.fit: a numerical vector of response variables with length n. Each coordinate must be a nonnegative-integer.

x

For mixpoissonregML: logical values indicating if the model matrix x should be returned as component.

For mixpoissonregML.fit: a matrix of covariates with respect to the mean with dimension (n,nbeta).

w

For mixpoissonregML: logical values indicating if the model matrix w should be returned as component.

For mixpoissonregML.fit a matrix of covariates with respect to the precision parameter. The default is NULL. If not NULL must be of dimension (n,nalpha).

envelope

number of simulations (synthetic data sets) to build envelopes for residuals (with 100*prob% confidence level). The default envelope = 0 dismisses the envelope analysis.

prob

probability indicating the confidence level for the envelopes (default: prob = 0.95). If envelope = 0, prob is ignored.

model.frame

logical indicating whether the model frame should be returned as component of the returned value.

em_controls

only used with the 'EM' method. A list containing two elements: maxit that contains the maximum number of iterations of the EM algorithm, the default is set to 5000; em_tol that defines the tolerance value to control the convergence criterion in the EM-algorithm, the default is set to 10^(-5). em_tolgrad that defines the tolerance value of the maximum-norm of the the gradient of the Q-function, the default is set to 10^(-2).

optim_method

main optimization algorithm to be used. The available methods are the same as those of optim function. The default is set to "L-BFGS-B".

optim_controls

a list of control arguments to be passed to the optim function in the optimization of the model. For the control options, see the 'Details' in the help of optim for the possible arguments.

Details

Among the regression models with discrete response variables, Poisson regression is the most popular for modeling count data. See, for instance Sellers and Shmueli (2010). It is well-known that this model is equidispersed (that is, the mean is equal to the variance), which in practice may be an unrealistic assumption. Several models have been introduced in the literature to overcome this problem such as negative binomial (NB) and Poisson inverse gaussian (PIG) distributions (see Lawless, 1987). The most common way to do this is to consider a mixed Poisson distribution, which is defined as follows. Let Z be a positive random variable (generally being continuous) with distribution function G_{\tau}(\cdot), where \tau denotes the parameter vector associated to the G distribution. Let Y|Z=z\simPoisson(\mu z), for some constant \mu>0. Therefore Y follows a mixed Poisson (MP) distribution with probability function given by

P(Y=y)=\int_0^\infty\frac{e^{-\mu z}(\mu z)^y}{y!}dG_{\tau}(z),

for y=0,1,\ldots. With this, Y has an overdispersed distribution and hence it is a natural alternative to the Poisson distribution. The most common choices for Z are gamma and inverse-gaussian distributions, which yields Y following, respectively, NB and PIG distributions. General properties of the MP distributions can be found in Karlis and Xekalaki (2005) and in the references therein.

In mixpoissonreg two regression models are implemented, namely, the NB and PIG regression models. We follow the definitions and notations given in Barreto-Souza and Simas (2016). The mixed Poisson regression model is defined by assuming Y_1,\ldots,Y_n is a random sample where Y_i\sim NB(\mu_i,\phi_i) or Y_i\sim PIG(\mu_i,\phi_i) for i = 1,\ldots,n. Under this parameterization we have E(Y_i) = \mu_i and Var(Y_i) = \mu_i(1+\mu_i\phi_i^{-1}b''(\xi_0)), where b(\theta) = -\log(-\theta) and \xi_0 = -1 for the NB case, and b(\theta) = -(-2\theta)^{1/2} and \xi_0 = -1/2 for the PIG case, with b''(\cdot) being the second derivative of the function b(\cdot). The following linear relations are assumed

\Lambda_1(\mu_i) = x_i^T \beta

and

\Lambda_2(\phi_i) = w_i^T \alpha,

where \beta = (\beta_1,...,\beta_p) and \alpha = (\alpha_1,...,\alpha_q) are real valued vectors. The terms x_i^T and v_i^T represent, respectively, the i-th row of the matrices "x" (n\times p) and "w" (n\times q) containing covariates in their columns (x_{i,1} and v_{i,1} may be 1 to handle intercepts).

Therefore, the mixpoissonreg package handles up to two regression structures at the same time: one for the mean parameter, one for the precision parameter. The regression structure for the mean is determined through a formula y ~ x1 + ... + xn, whereas the regression structure for the precision parameter is determined through the right-hand side of the formula using the separator "|". So, for example, a regression with x1,...,xn as covariates for the mean and z1,...,zm as covariates for the precision parameter corresponds to the formula y ~ x1 + ... + xn | z1 + ... + zm. If only there is only formula for the regression structure for the mean, the regression structure for the precision parameter will only have the intercept, that is, y ~ x1 + ... + xn is the same as y ~ x1 + ... + xn | 1.

In general, in this package, the EM-algorithm estimation method obtains estimates closer to the maximum likelihood estimate than the maximum likelihood estimation method, in the sense that the likelihood function evaluated at the EM-algorithm estimate is greater or equal (usually strictly greater) than the likelihood function evaluated at the maximum likelihood estimate. So, unless the processing time is an issue, we strongly recommend the EM-algorithm as the estimation method.

In Barreto-Souza and Simas (2016) two residuals were studied: the pearson residuals and the score residuals. Both these residuals are implemented in the mixpoissonreg package. They coincide for NB regression models. They can be accessed via the residuals method.

It is also noteworthy that all the global and local influence analysis tools developed in Barreto-Souza and Simas (2016) are implemented in this package. See influence.mixpoissonreg, local_influence.mixpoissonreg, local_influence_plot.mixpoissonreg and local_influence_autoplot.mixpoissonreg.

Value

mixpoissonregML returns an object of class "mixpoissonreg" whereas mixpoissonregML.fit returns an object of class "mixpoissonreg_fit". Both objects are given by lists containing the outputs from the model fit (Negative-Binomial or Poisson Inverse Gaussian regression).

An object of the class "mixpoissonreg" is a list containing the following elements:

References

DOI:10.1007/s11222-015-9601-6 doi: 10.1007/s11222-015-9601-6(Barreto-Souza and Simas; 2016)

URL:https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1751-5823.2005.tb00250.x (Karlis and Xekalaki; 2005)

DOI:10.2307/3314912 doi: 10.2307/3314912(Lawless; 1987)

Sellers, K.F. and Shmueli, G. (2010) A flexible regression model for count data. Ann. Appl. Stat., 4, 943-961

See Also

summary.mixpoissonreg, plot.mixpoissonreg, autoplot.mixpoissonreg, residuals.mixpoissonreg, predict.mixpoissonreg,influence.mixpoissonreg, cooks.distance.mixpoissonreg, local_influence.mixpoissonreg, local_influence_plot.mixpoissonreg, local_influence_autoplot.mixpoissonreg

Examples

# Examples using the Attendance dataset:

daysabs_progML <- mixpoissonregML(daysabs ~ prog, data = Attendance)
summary(daysabs_progML)


[Package mixpoissonreg version 1.0.0 Index]