bizicount {bizicount} | R Documentation |
Bizicount: Maximum likelihood estimation of copula-based bivariate zero-inflated (and non-inflated) count models
Description
The main bivariate regression function of the bizicount-package
Estimates copula-based bivariate zero-inflated (and non-inflated)
count models via maximum likelihood. Supports the Frank and Gaussian
copulas, as well as zero-inflated Poisson and negative binomial margins
(and their non-inflated counterparts). It's class has associated
simulate
methods for post-estimation diagnostics using
the DHARMa
package, as well as an
extract
method for printing professional tables using
texreg
, and a test for zero-modification using zi_test
.
See the 'See Also' section for links to these methods.
Usage
bizicount(
fmla1,
fmla2,
data,
cop = "gaus",
margins = c("pois", "pois"),
link.ct = c("log", "log"),
link.zi = c("logit", "logit"),
scaling = "none",
starts = NULL,
keep = TRUE,
subset,
na.action,
weights,
frech.min = 1e-07,
pmf.min = 1e-07,
...
)
Arguments
fmla1 , fmla2 |
|
data |
A |
cop |
Character string specifying the copula to be used. One of
|
margins |
Length 2 character vector specifying the marginal
distributions for each outcome. Each of the two elements must be one of
|
link.ct |
Length 2 character string specifying the link function used
for the count portion of each margin. One of |
link.zi |
Length 2 character string specifying the link function used
for the zero-inflation portion of each margin. One of |
scaling |
Deprecated. It is recommended that users scale their covariates
if they encounter convergence issues, which can be accomplished using the
|
starts |
Numeric vector of starting values for parameter estimates. See
'Details' section regarding the correct order for the values in this vector.
If |
keep |
Logical indicating whether to keep the model matrix in the
returned model object. Defaults to |
subset |
A vector indicating the subset of observations to use in estimation. |
na.action |
Deprecated. |
weights |
An optional numeric vector of weights for each observation. |
frech.min |
Lower boundary for Frechet-Hoeffding bounds on copula CDF.
Used for computational purposes to prevent over/underflow in likelihood
search. Must be in |
pmf.min |
Lower boundary on copula PMF evaluations. Used for
computational purposes to prevent over/underflow in likelihood search. Must
be in |
... |
Additional arguments to be passed on to the quasi-newton fitting
function, |
Details
-
starts
– Starting values should be organized as follows:count parameters for margin 1
count parameters for margin 2
zero-inflated parameters for margin 1 (if applicable),
zero-inflated parameters for margin 2 (if applicable),
inverse dispersion parameter for margin 1 (if applicable),
inverse dispersion parameter for margin 2 (if applicable)
Thus, in general count parameters should come first, followed by zero-inflation parameters, and finally inverse dispersion parameters.
-
frech.min
– Changing this argument should almost never be necessary. Frechet (1951) and Hoeffding (1940) showed that copula CDFs have bounds of the formmax\{u + v - 1, 0\} \le C(u, v) \le min\{u, v\}
, whereu
andv
are uniform realizations derived from the probability integral transform. Due to numerical underflow, very small values ofu
andv
can be rounded to zero. Particularly when evaluating the Gaussian copula CDF this is problematic, ultimately leading to infinite-valued likelihood evaluations. Therefore, we impose Frechet-Hoeffding bounds numerically asmax\{u + v - 1, frech.min\} \le C(u, v) \le min\{u, v, 1 - frech.min\}
. NOTE: Setting this to 0 imposes the original Frechet bounds mentioned above. -
pmf.min
– Changing this argument should almost never be necessary. Observations can have likelihoods that are extremely close to 0. Numerically, these get rounded to 0 due to underflow. Then, taking logarithms results in an infinite likelihood. To avoid this, we bound PMF evaluations from below atpmf.min
. -
...
– Sometimes it may be useful to alternlm
's default parameters. This can be done by simply passing those arguments intobizicount()
. The two that seem to benefit the fitting process the most arestepmax
andsteptol
. Readers are referred to the documentation onnlm
for more details on these parameters. It can be useful to lowerstepmax
particularly when the Hessian is not negative definite at convergence, sometimes to a value between 0 and 1. It can also be beneficial to increasesteptol
.
Value
An S3 bizicount-class
object, which is a list containing:
-
coef
– Coefficients of the model -
coef.nid
– Coefficients without margin IDs -
coef.orig
– Coefficients prior to transformations, for Gaussian dependence and negative binomial dispersion. -
coef.orig.nid
– Coefficients prior to transforms, no margin IDs. -
se
– Asymptotic normal-theory standard errors based on observed Fisher Information -
se.nid
– Standard errors without margin IDs -
z
– z-scores for parameter estimates -
z.nid
– z-scores without margin IDs -
p
– p-values for parameter estimates -
p.nid
– p-values without margin IDs -
coefmats
– A list containing coeficient matrices for each margin -
loglik
– Scalar log-likelihood at convergence -
grad
– Numerical gradient vector at convergence -
n.iter
– Number of quasi-newton fitting iterations. -
covmat
– Covariance matrix of parameter estimates based on observed Fisher Information -
aic
– Model's Akaike information -
bic
– Model's Bayesian information criterion -
nobs
– Number of observations -
margins
– Marginal distributions used in fitting -
link.zi, link.ct
– Names of link functions used in fitting -
invlink.ct, invlink.zi
– Inverse link functions used in fitting (the actual function, not their names) -
outcomes
– Name of the response vector -
conv
– Integer telling convergence status in nlm. See ?nlm. -
cop
– The copula used in fitting -
starts
– list of starting values used -
call
– The model's call -
model
– List containing model matrices, orNULL
ifkeep = F
.
Author(s)
John Niehaus
References
Genest C, Nešlehová J (2007). “A primer on copulas for count data.” ASTIN Bulletin: The Journal of the IAA, 37(2), 475–515.
Inouye DI, Yang E, Allen GI, Ravikumar P (2017). “A review of multivariate distributions for count data derived from the Poisson distribution.” Wiley Interdisciplinary Reviews: Computational Statistics, 9(3).
Joe H (1997). Multivariate models and multivariate dependence concepts. CRC Press.
Nikoloulopoulos A (2013). “Copula-Based Models for Multivariate Discrete Response Data.” In P Jaworski, F Durante, WK Härdle (eds.), Copulae in Mathematical and Quantitative Finance, chapter 11, pp. 231–250. Springer.
Nelsen RB (2007). An Introduction to Copulas. Springer Science & Business Media.
Trivedi P, Zimmer D (2017). “A note on identification of bivariate copulas for discrete countdata.” Econometrics, 5(1), 10.
Trivedi PK, Zimmer DM (2007). Copula modeling: an introduction for practitioners. NowPublishers Inc.
See Also
extract.bizicount
, make_DHARMa
, zi_test
Examples
### bizicount example
## SETUP
set.seed(123)
n = 300
# define a function to simulate from a gaussian copula
# first margin is zero-inflated negative binomial (zinb)
# second margin is zero-inflated poisson (zip)
# Note: marginal distributions are hard-coded in function, including
# inverse dispersion parameter for zinb.
gen = function(n,
b1,
b2,
g1,
g2,
dep) {
k1 = length(b1)
k2 = length(b2)
X1 = cbind(1, matrix(rbinom(n * (k1 - 1), 1, .5), ncol = k1 - 1))
X2 = cbind(1, matrix(rexp(n * (k2 - 1), 3), ncol = k2 - 1))
lam1 = exp(X1 %*% b1)
lam2 = exp(X2 %*% b2)
Z1 = cbind(1, matrix(runif(n * (k1 - 1), -1, 1), ncol = k1 - 1))
Z2 = cbind(1, matrix(rnorm(n * (k2 - 1)), ncol = k2 - 1))
psi1 = plogis(Z1 %*% g1)
psi2 = plogis(Z2 %*% g2)
norm_vars = MASS::mvrnorm(
n,
mu = c(0, 0),
Sigma = matrix(c(1, dep, dep, 1), ncol =2)
)
U = pnorm(norm_vars)
y1 = qzinb(U[, 1],
mu = lam1,
psi = psi1,
size = .3)
y2 = qzip(U[, 2],
lambda = lam2,
psi = psi2)
dat = data.frame(
X1 = X1[, -1],
X2 = X2[, -1],
Z1 = Z1[, -1],
Z2 = Z2[, -1],
y1,
y2,
lam1,
lam2,
psi1,
psi2
)
return(dat)
}
# define parameters
b1 = c(1, -2, 3)
b2 = c(-1, 3, 1)
g1 = c(2, -1.5, 2)
g2 = c(-1, -3.75, 1.25)
rho = .5
# generate data
dat = gen(n, b1, b2, g1, g2, rho)
f1 = y1 ~ X1.1 + X1.2 | Z1.1 + Z1.2
f2 = y2 ~ X2.1 + X2.2 | Z2.1 + Z2.2
## END SETUP
# estimate model
mod = bizicount(f1, f2, dat, cop = "g", margins = c("zinb", "zip"), keep = TRUE)
print(mod)
summary(mod)