bizicount {bizicount} R Documentation

## Bizicount: Maximum likelihood estimation of copula-based bivariate zero-inflated (and non-inflated) count models

### Description

The main bivariate regression function of the bizicount-package Estimates copula-based bivariate zero-inflated (and non-inflated) count models via maximum likelihood. Supports the Frank and Gaussian copulas, as well as zero-inflated Poisson and negative binomial margins (and their non-inflated counterparts). It's class has associated simulate methods for post-estimation diagnostics using the DHARMa package, as well as an extract method for printing professional tables using texreg, and a test for zero-modification using zi_test. See the 'See Also' section for links to these methods.

### Usage

bizicount(
fmla1,
fmla2,
data,
cop = "gaus",
margins = c("pois", "pois"),
scaling = "none",
starts = NULL,
keep = TRUE,
subset,
na.action,
weights,
frech.min = 1e-07,
pmf.min = 1e-07,
...
)


### Arguments

 fmla1, fmla2 formulas for the first margin and second margins, respectively. If non-inflated, of the form y ~ x_1 + x_2 + ... + x_k; if inflated, of the form y ~ x1 + x2 + ... + x_k| z1 + z2 + ... + z_p, where y is the outcome for the first margin, x are covariates for count parameters, and z are covariates for zero-inflated parameters in each margin. All covariates can be the same. data A data.frame containing the response variables, covariates, and offsets for the model. If NULL, these quantities are searched for in the parent environment. cop Character string specifying the copula to be used. One of c("gaus", "frank"). Partial matching supported. margins Length 2 character vector specifying the marginal distributions for each outcome. Each of the two elements must be one of c("pois", "nbinom", "zip", "zinb"), and must be consistent with its corresponding formula (i.e., zero-inflated margins with zero-inflated formulas). link.ct Length 2 character string specifying the link function used for the count portion of each margin. One of c("log", "identity", "sqrt"). link.zi Length 2 character string specifying the link function used for the zero-inflation portion of each margin. One of c("logit", "probit", "cauchit", "log", "cloglog"). Ignored if corresponding margins entry is not zero-inflated. scaling Character string (partial matching supported) indicating what type of scaling to apply to covariates (if any). One of c("none", "1sd", "gelman", "mm"). "none" will apply no alterations to covariates. "1sd" will subtract off the mean, and divide by one standard deviation. "gelman" will subtract off the mean, and divide by two standard deviations, which makes binary and continuous variables have a similar interpretation. See Gelman (2008), "Scaling Regression Inputs by Dividing by Two Standard Deviations." "mm" will apply min-max normalization so that continuous covariates lie within a unit hypercube. Factor variables, offsets, and the intercept are not scaled. The names of variables that have been scaled are returned as part of the bizicount object, in the list-element called scaled. Scaling is highly recommended to improve model convergence. starts Numeric vector of starting values for parameter estimates. See 'Details' section regarding the correct order for the values in this vector. If NULL, starting values are obtained automatically by a univariate regression fit. keep Logical indicating whether to keep the model matrix in the returned model object. Defaults to TRUE, but can be set to FALSE to conserve memory. NOTE: Must be TRUE to use any post-estimation functions in this package, including zi_test. subset A vector indicating the subset of observations to use in estimation. na.action A function which indicates what should happen when the data contain NAs. Default is na.omit. weights An optional numeric vector of weights for each observation. frech.min Lower boundary for Frechet-Hoeffding bounds on copula CDF. Used for computational purposes to prevent over/underflow in likelihood search. Must be in [0, 1e-5], with 0 imposing the original FH bounds without computational consideration. See 'Details.' pmf.min Lower boundary on copula PMF evaluations. Used for computational purposes to prevent over/underflow in likelihood search. Must be in [0, 1e-5], with 0 imposing no bound. See ‘Details.’ ... Additional arguments to be passed on to the quasi-newton fitting function, nlm. See 'Details' for some parameters that may be useful to alter.

### Details

• starts – Starting values should be organized as follows:

1. count parameters for margin 1

2. count parameters for margin 2

3. zero-inflated parameters for margin 1 (if applicable),

4. zero-inflated parameters for margin 2 (if applicable),

5. inverse dispersion parameter for margin 1 (if applicable),

6. inverse dispersion parameter for margin 2 (if applicable)

Thus, in general count parameters should come first, followed by zero-inflation parameters, and finally inverse dispersion parameters.

• frech.min – Changing this argument should almost never be necessary. Frechet (1951) and Hoeffding (1940) showed that copula CDFs have bounds of the form max\{u + v - 1, 0\} \le C(u, v) \le min\{u, v\}, where u and v are uniform realizations derived from the probability integral transform. Due to numerical underflow, very small values of u and v can be rounded to zero. Particularly when evaluating the Gaussian copula CDF this is problematic, ultimately leading to infinite-valued likelihood evaluations. Therefore, we impose Frechet-Hoeffding bounds numerically as max\{u + v - 1, frech.min\} \le C(u, v) \le min\{u, v, 1 - frech.min\}. NOTE: Setting this to 0 imposes the original Frechet bounds mentioned above.

• pmf.min – Changing this argument should almost never be necessary. Observations can have likelihoods that are extremely close to 0. Numerically, these get rounded to 0 due to underflow. Then, taking logarithms results in an infinite likelihood. To avoid this, we bound PMF evaluations from below at pmf.min.

• ... – Sometimes it may be useful to alter nlm's default parameters. This can be done by simply passing those arguments into bizicount(). The two that seem to benefit the fitting process the most are stepmax and steptol. Readers are referred to the documentation on nlm for more details on these parameters. It can be useful to lower stepmax particularly when the Hessian is not negative definite at convergence, sometimes to a value between 0 and 1. It can also be beneficial to increase steptol.

### Value

An S3 bizicount-class object, which is a list containing:

• coef – Coefficients of the model

• coef.nid – Coefficients without margin IDs

• coef.orig – Coefficients prior to transformations, for Gaussian dependence and negative binomial dispersion.

• coef.orig.nid – Coefficients prior to transforms, no margin IDs.

• se – Asymptotic normal-theory standard errors based on observed Fisher Information

• se.nid – Standard errors without margin IDs

• z – z-scores for parameter estimates

• z.nid – z-scores without margin IDs

• p – p-values for parameter estimates

• p.nid – p-values without margin IDs

• coefmats – A list containing coeficient matrices for each margin

• loglik – Scalar log-likelihood at convergence

• grad – Numerical gradient vector at convergence

• n.iter – Number of quasi-newton fitting iterations.

• covmat – Covariance matrix of parameter estimates based on observed Fisher Information

• aic – Model's Akaike information

• bic – Model's Bayesian information criterion

• nobs – Number of observations

• margins – Marginal distributions used in fitting

• ⁠link.zi, link.ct⁠ – Names of link functions used in fitting

• ⁠invlink.ct, invlink.zi⁠ – Inverse link functions used in fitting (the actual function, not their names)

• outcomes – Name of the response vector

• conv – Integer telling convergence status in nlm. See ?nlm.

• cop – The copula used in fitting

• starts – list of starting values used

• call – The model's call

• model – List containing model matrices, or NULL if keep = F.

• scaled – Vector indicating which covariates in each margin were scaled according to the scaling parameter.

John Niehaus

### References

Genest C, Nešlehová J (2007). “A primer on copulas for count data.” ASTIN Bulletin: The Journal of the IAA, 37(2), 475–515.

Inouye DI, Yang E, Allen GI, Ravikumar P (2017). “A review of multivariate distributions for count data derived from the Poisson distribution.” Wiley Interdisciplinary Reviews: Computational Statistics, 9(3).

Joe H (1997). Multivariate models and multivariate dependence concepts. CRC Press.

Nikoloulopoulos A (2013). “Copula-Based Models for Multivariate Discrete Response Data.” In P Jaworski, F Durante, WK Härdle (eds.), Copulae in Mathematical and Quantitative Finance, chapter 11, pp. 231–250. Springer.

Nelsen RB (2007). An Introduction to Copulas. Springer Science & Business Media.

Trivedi P, Zimmer D (2017). “A note on identification of bivariate copulas for discrete countdata.” Econometrics, 5(1), 10.

Trivedi PK, Zimmer DM (2007). Copula modeling: an introduction for practitioners. NowPublishers Inc.

extract.bizicount, make_DHARMa, zi_test

### Examples

### bizicount example

## SETUP
set.seed(123)
n = 300

# define a function to simulate from a gaussian copula
# first margin is zero-inflated negative binomial (zinb)
# second margin is zero-inflated poisson (zip)
# Note: marginal distributions are hard-coded in function, including
# inverse dispersion parameter for zinb.
gen = function(n,
b1,
b2,
g1,
g2,
dep) {

k1 = length(b1)
k2 = length(b2)

X1 = cbind(1, matrix(rbinom(n * (k1 - 1), 1, .5), ncol = k1 - 1))
X2 = cbind(1, matrix(rexp(n * (k2 - 1), 3), ncol = k2 - 1))

lam1 = exp(X1 %*% b1)
lam2 = exp(X2 %*% b2)

Z1 = cbind(1, matrix(runif(n * (k1 - 1), -1, 1), ncol = k1 - 1))
Z2 = cbind(1, matrix(rnorm(n * (k2 - 1)), ncol = k2 - 1))

psi1 = plogis(Z1 %*% g1)
psi2 = plogis(Z2 %*% g2)

norm_vars = MASS::mvrnorm(
n,
mu = c(0, 0),
Sigma = matrix(c(1, dep, dep, 1), ncol =2)
)

U = pnorm(norm_vars)

y1 =  qzinb(U[, 1],
mu = lam1,
psi = psi1,
size = .3)
y2 =  qzip(U[, 2],
lambda = lam2,
psi = psi2)

dat = data.frame(
X1 = X1[, -1],
X2 = X2[, -1],
Z1 = Z1[, -1],
Z2 = Z2[, -1],
y1,
y2,
lam1,
lam2,
psi1,
psi2
)
return(dat)
}

# define parameters
b1 = c(1, -2, 3)
b2 = c(-1, 3, 1)
g1 = c(2, -1.5, 2)
g2 = c(-1, -3.75, 1.25)
rho = .5

# generate data
dat = gen(n, b1, b2, g1, g2, rho)
f1 = y1 ~ X1.1 + X1.2 | Z1.1 + Z1.2
f2 = y2 ~ X2.1 + X2.2 | Z2.1 + Z2.2

## END SETUP

# estimate model

mod = bizicount(f1, f2, dat, cop = "g", margins = c("zip", "zip"), keep = TRUE)

print(mod)
summary(mod)


[Package bizicount version 1.2.0 Index]