bizicount {bizicount}R Documentation

Bizicount: Maximum likelihood estimation of copula-based bivariate zero-inflated (and non-inflated) count models

Description

The main bivariate regression function of the bizicount-package Estimates copula-based bivariate zero-inflated (and non-inflated) count models via maximum likelihood. Supports the Frank and Gaussian copulas, as well as zero-inflated Poisson and negative binomial margins (and their non-inflated counterparts). It's class has associated simulate methods for post-estimation diagnostics using the DHARMa package, as well as an extract method for printing professional tables using texreg, and a test for zero-modification using zi_test. See the 'See Also' section for links to these methods.

Usage

bizicount(
  fmla1,
  fmla2,
  data,
  cop = "gaus",
  margins = c("pois", "pois"),
  link.ct = c("log", "log"),
  link.zi = c("logit", "logit"),
  scaling = "none",
  starts = NULL,
  keep = TRUE,
  subset,
  na.action,
  weights,
  frech.min = 1e-07,
  pmf.min = 1e-07,
  ...
)

Arguments

fmla1, fmla2

formulas for the first margin and second margins, respectively. If non-inflated, of the form y ~ x_1 + x_2 + ... + x_k; if inflated, of the form y ~ x1 + x2 + ... + x_k| z1 + z2 + ... + z_p, where y is the outcome for the first margin, x are covariates for count parameters, and z are covariates for zero-inflated parameters in each margin. All covariates can be the same.

data

A data.frame containing the response variables, covariates, and offsets for the model. If NULL, these quantities are searched for in the parent environment.

cop

Character string specifying the copula to be used. One of c("gaus", "frank"). Partial matching supported.

margins

Length 2 character vector specifying the marginal distributions for each outcome. Each of the two elements must be one of c("pois", "nbinom", "zip", "zinb"), and must be consistent with its corresponding formula (i.e., zero-inflated margins with zero-inflated formulas).

link.ct

Length 2 character string specifying the link function used for the count portion of each margin. One of c("log", "identity", "sqrt").

link.zi

Length 2 character string specifying the link function used for the zero-inflation portion of each margin. One of c("logit", "probit", "cauchit", "log", "cloglog"). Ignored if corresponding margins entry is not zero-inflated.

scaling

Character string (partial matching supported) indicating what type of scaling to apply to covariates (if any). One of c("none", "1sd", "gelman", "mm").

  • "none" will apply no alterations to covariates.

  • "1sd" will subtract off the mean, and divide by one standard deviation.

  • "gelman" will subtract off the mean, and divide by two standard deviations, which makes binary and continuous variables have a similar interpretation. See Gelman (2008), "Scaling Regression Inputs by Dividing by Two Standard Deviations."

  • "mm" will apply min-max normalization so that continuous covariates lie within a unit hypercube.

Factor variables, offsets, and the intercept are not scaled. The names of variables that have been scaled are returned as part of the bizicount object, in the list-element called scaled. Scaling is highly recommended to improve model convergence.

starts

Numeric vector of starting values for parameter estimates. See 'Details' section regarding the correct order for the values in this vector. If NULL, starting values are obtained automatically by a univariate regression fit.

keep

Logical indicating whether to keep the model matrix in the returned model object. Defaults to TRUE, but can be set to FALSE to conserve memory. NOTE: Must be TRUE to use any post-estimation functions in this package, including zi_test.

subset

A vector indicating the subset of observations to use in estimation.

na.action

A function which indicates what should happen when the data contain NAs. Default is na.omit.

weights

An optional numeric vector of weights for each observation.

frech.min

Lower boundary for Frechet-Hoeffding bounds on copula CDF. Used for computational purposes to prevent over/underflow in likelihood search. Must be in [0, 1e-5], with 0 imposing the original FH bounds without computational consideration. See 'Details.'

pmf.min

Lower boundary on copula PMF evaluations. Used for computational purposes to prevent over/underflow in likelihood search. Must be in [0, 1e-5], with 0 imposing no bound. See ‘Details.’

...

Additional arguments to be passed on to the quasi-newton fitting function, nlm. See 'Details' for some parameters that may be useful to alter.

Details

Value

An S3 bizicount-class object, which is a list containing:

Author(s)

John Niehaus

References

Genest C, Nešlehová J (2007). “A primer on copulas for count data.” ASTIN Bulletin: The Journal of the IAA, 37(2), 475–515.

Inouye DI, Yang E, Allen GI, Ravikumar P (2017). “A review of multivariate distributions for count data derived from the Poisson distribution.” Wiley Interdisciplinary Reviews: Computational Statistics, 9(3).

Joe H (1997). Multivariate models and multivariate dependence concepts. CRC Press.

Nikoloulopoulos A (2013). “Copula-Based Models for Multivariate Discrete Response Data.” In P Jaworski, F Durante, WK Härdle (eds.), Copulae in Mathematical and Quantitative Finance, chapter 11, pp. 231–250. Springer.

Nelsen RB (2007). An Introduction to Copulas. Springer Science & Business Media.

Trivedi P, Zimmer D (2017). “A note on identification of bivariate copulas for discrete countdata.” Econometrics, 5(1), 10.

Trivedi PK, Zimmer DM (2007). Copula modeling: an introduction for practitioners. NowPublishers Inc.

See Also

extract.bizicount, make_DHARMa, zi_test

Examples

### bizicount example

## SETUP
set.seed(123)
n = 300

# define a function to simulate from a gaussian copula
# first margin is zero-inflated negative binomial (zinb)
# second margin is zero-inflated poisson (zip)
# Note: marginal distributions are hard-coded in function, including
# inverse dispersion parameter for zinb.
gen = function(n,
               b1,
               b2,
               g1,
               g2,
               dep) {

     k1 = length(b1)
     k2 = length(b2)

     X1 = cbind(1, matrix(rbinom(n * (k1 - 1), 1, .5), ncol = k1 - 1))
     X2 = cbind(1, matrix(rexp(n * (k2 - 1), 3), ncol = k2 - 1))

     lam1 = exp(X1 %*% b1)
     lam2 = exp(X2 %*% b2)

     Z1 = cbind(1, matrix(runif(n * (k1 - 1), -1, 1), ncol = k1 - 1))
     Z2 = cbind(1, matrix(rnorm(n * (k2 - 1)), ncol = k2 - 1))

     psi1 = plogis(Z1 %*% g1)
     psi2 = plogis(Z2 %*% g2)

     norm_vars = MASS::mvrnorm(
          n,
          mu = c(0, 0),
          Sigma = matrix(c(1, dep, dep, 1), ncol =2)
          )

     U = pnorm(norm_vars)

     y1 =  qzinb(U[, 1],
                 mu = lam1,
                 psi = psi1,
                 size = .3)
     y2 =  qzip(U[, 2],
                lambda = lam2,
                psi = psi2)

     dat = data.frame(
          X1 = X1[, -1],
          X2 = X2[, -1],
          Z1 = Z1[, -1],
          Z2 = Z2[, -1],
          y1,
          y2,
          lam1,
          lam2,
          psi1,
          psi2
     )
     return(dat)
}


# define parameters
b1 = c(1, -2, 3)
b2 = c(-1, 3, 1)
g1 = c(2, -1.5, 2)
g2 = c(-1, -3.75, 1.25)
rho = .5


# generate data
dat = gen(n, b1, b2, g1, g2, rho)
f1 = y1 ~ X1.1 + X1.2 | Z1.1 + Z1.2
f2 = y2 ~ X2.1 + X2.2 | Z2.1 + Z2.2

## END SETUP

# estimate model

mod = bizicount(f1, f2, dat, cop = "g", margins = c("zip", "zip"), keep = TRUE)

print(mod)
summary(mod)

[Package bizicount version 1.2.0 Index]