aodml {aods3} | R Documentation |
ML Estimation of Generalized Linear Models for Overdispersed Count Data
Description
The function fits a beta-binomial (BB) or a negative binomial (NB) generalized linear model from clustered data.
For the BB model, data have the form {(n_1, m_1), (n_2, m_2), ..., (n_N, m_N)
}, where n_i
is the size of cluster i
, m_i
the number of “successes”, and N
the number of clusters. The response is the proportion y = m/n.
For the NB model, data can be of two types. When modeling simple counts, data have the form {m_1, m_2, ..., m_N
}, where m_i
is the number of occurences of the event under study. When modeling rates (e.g. hazard rates), data have the same form as for the BB model, where n_i
is the denominator of the rate for cluster i
(considered as an “offset”, i.e. a constant known value) and m_i
the number of occurences of the event. For both types of data, the response is the count y = m
.
Usage
aodml(formula,
data,
family = c("bb", "nb"),
link = c("logit", "cloglog", "probit"),
phi.formula = ~ 1,
phi.scale = c("identity", "log", "inverse"),
phi.start = NULL,
fixpar = list(),
hessian = TRUE,
method = c("BFGS", "Nelder-Mead", "CG", "SANN"),
control = list(maxit = 3000, trace = 0), ...)
## S3 method for class 'aodml'
print(x, ...)
## S3 method for class 'aodml'
summary(object, ...)
## S3 method for class 'aodml'
anova(object, ...)
## S3 method for class 'anova.aodml'
print(x, digits, ...)
## S3 method for class 'aodml'
fitted(object, ..., what = c("mu", "nu", "eta", "phi"))
## S3 method for class 'aodml'
residuals(object, ..., type = c("deviance", "pearson", "response"))
## S3 method for class 'aodml'
coef(object, ...)
## S3 method for class 'aodml'
df.residual(object, ...)
## S3 method for class 'aodml'
logLik(object, ...)
## S3 method for class 'aodml'
deviance(object, ...)
## S3 method for class 'aodml'
AIC(object, ..., k = 2)
## S3 method for class 'aodml'
vcov(object, ...)
## S3 method for class 'aodml'
predict(object, ..., type = c("link", "response"), se.fit = FALSE, newdata = NULL)
Arguments
formula |
A formula for the mean For the BB model, the left-hand side of the formula must be of the form
where the fitted proportion is For the NB model, the left-hand side of the formula must be of the form
where the fitted count is |
data |
A data frame containing the response ( |
family |
Define the model which is fitted: “bb” for the BB model and “nb” for the NB model. |
link |
For the BB model only. Define the link function |
phi.formula |
A right-hand side formula to model optional heterogeneity for the over-dispersion parameter Default to |
phi.scale |
Scale on which |
phi.start |
Optional starting values for |
fixpar |
An optional list of 2 vectors of same length ( The first vector indicates which parameters are set as constant (i.e., not optimized) in the global parameter vector The second vector indicates the corresponding values. For instance,
means that the 4th and 5th components of vector |
hessian |
A logical (default to |
method |
Define the method used for the optimization (see |
control |
A list to control the optimization parameters. See |
object |
An object of class |
x |
An object of class |
digits |
Number of digits to print in print.summary.aodml and print.anova.aodml.
Default to |
... |
Further arguments passed to |
what |
For function |
type |
For function |
k |
Numeric scalar for the penalty parameter used to compute the information criterion. The default value ( |
se.fit |
Logical scalar indicating whether standard errors should be computed for the predicted values. Default to |
newdata |
A |
Details
Beta-binomial model (BB)
For a given cluster (n, m)
, the model is
m | \lambda,n ~ Binomial(n, \lambda)
where \lambda
follows a Beta distribution Beta(a_1, a_2)
. Noting B
the beta function, the marginal (beta-binomial) distribution of m
is
P(m | n) = C(n, m) * B(a_1 + m, a_2 + n - m) / B(a_1, a_2)
Using the re-parameterization \mu = a_1 / (a_1 + a_2)
and \Phi = 1 / (a_1 + a_2 + 1)
, the marginal mean and variance are
E[m] = n * \mu
Var[m] = n * \mu * (1 - \mu) * (1 + (n - 1) * \Phi)
The response in aodml
is y = m/n
. The mean is E[y] = \mu
, defined such as \mu = g^{-1}(X * b) = g^{-1}(\nu)
, where g
is the link function, X
is a design-matrix, b
a vector of fixed effects and \nu = X * b
is the corresponding linear predictor. The variance is Var[y] = (1 / n) * \mu * (1 - \mu) * (1 + (n - 1) * \Phi)
.
Negative binomial model (NB)
—— Simple counts (model with no offset)
For a given cluster (m)
, the model is
y | \lambda ~ Poisson(\lambda)
where \lambda
follows a Gamma distribution of mean \mu
and shape k
(Var[\lambda] = \mu^2 / k
). Noting G
the gamma function, the marginal (negative binomial) distribution of m
is
P(m) = {G(m+k) / (m! * G(k))} * (k / (k + \mu))^k * (\mu / (k + \mu))^m
Using the re-parameterization \Phi = 1 / k
, the marginal mean and variance are
E[m] = \mu
Var[m] = \mu + \Phi * \mu^2
The response in aodml
is y = m
. The mean is E[y] = \mu
, defined such as \mu = exp(X * b) = exp(\nu)
. The variance is Var[y] = \mu + \Phi * \mu^2
.
—— Rates (model with offset)
For a given cluster (n, m)
, the model is
m | \lambda,n ~ Poisson(\lambda)
The marginal (negative binomial) distribution P(m|n)
is the same as for the case with no offset (= P(m)
). The response in aodml
is y = m
. The mean is E[y] = \mu
, defined such as \mu = exp(X * b + log(n)) = exp(\nu + log(n)) = exp(\eta)
, where log(n)
is the offset. The variance is Var[y] = \mu + \Phi * \mu^2
.
Other details
Argument phi.scale
of function aodml
enables to estimate the over-dispersion parameter under different scales.
If phi.scale = "identity"
(Default), the function estimates \Phi
.
If phi.scale = "log"
, the function estimates log(\Phi)
.
If phi.scale = "inverse"
, the function estimates 1/\Phi
.
The full parameter vector returned by aodml
, say param
, is equal to (b, \Phi)
. This vector is estimated by maximizing the log-likelihood of the marginal model using function optim
. The estimated variances-covariances matrix of param
is calculated by the inverse of the observed hessian matrix returned by optim
, and is referred to as varparam
.
Value
An object of class aodml
, printed and summarized by various functions. Function deviance.aodml
returns the value -2 * (logL - logL_max)
. The “deviance” used in function AIC.aodml
to calculate AIC and AICc is -2 * logL
.
References
Crowder, M.J., 1978. Beta-binomial anova for proportions. Appl. Statist. 27, 34-37.
Griffiths, D.A., 1973. Maximum likelihood estimation for the beta-binomial distribution and an application
to the household distribution of the total number of cases of disease. Biometrics 29, 637-648.
Lawless, J.F., 1987. Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3): 209-225.
McCullagh, P., Nelder, J. A., 1989, 2nd ed. Generalized linear models. New York, USA: Chapman and Hall.
Prentice, R.L., 1986. Binary regression using an extended beta-binomial distribution, with discussion of
correlation induced by covariate measurement errors. J.A.S.A. 81, 321-327.
Williams, D.A., 1975. The analysis of binary responses from toxicological experiments involving
reproduction and teratogenicity. Biometrics 31, 949-952.
See Also
Examples
#------ Beta-binomial model
data(orob2)
fm1 <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb")
# summaries
fm1
summary(fm1)
coef(fm1)
vcov(fm1)
logLik(fm1)
deviance(fm1)
AIC(fm1)
gof(fm1)
# predictions
cbind(
fitted(fm1),
fitted(fm1, what = "nu"),
fitted(fm1, what = "eta"),
fitted(fm1, what = "phi")
)
predict(fm1, type = "response", se.fit = TRUE)
newdat <- data.frame(seed = c("O73", "O75"))
predict(fm1, type = "response", se.fit = TRUE, newdata = newdat)
# model with heterogeneity in phi
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2,
family = "bb", phi.formula = ~ seed)
summary(fm)
AIC(fm1, fm)
# various phi scales
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb")
fm$phi
fm$phi.scale
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb",
phi.scale = "log")
fm$phi
fm$phi.scale
fm <- aodml(cbind(m, n - m) ~ seed, data = orob2, family = "bb",
phi.scale = "inverse")
fm$phi
fm$phi.scale
### Models with coefficient(s) set as constant
# model with 1 phi coefficient, set as constant "0.02"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
family = "bb", fixpar = list(5, 0.02))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant ~ "0"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
family = "bb", phi.formula = ~ seed, fixpar = list(5, 1e-15))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant ~ "0",
# and the mu intercept (1rst coef of vector b) set as as constant "-0.5"
fm <- aodml(formula = cbind(m, n - m) ~ seed * root, data = orob2,
family = "bb", phi.formula = ~ seed,
fixpar = list(c(1, 5), c(-0.5, 1e-15)))
fm$param
fm$varparam
### Model tests
# LR tests - non-constant phi
fm0 <- aodml(cbind(m, n - m) ~ 1, data = orob2, family = "bb")
fm2 <- aodml(cbind(m, n - m) ~ seed + root, data = orob2, family = "bb")
fm3 <- aodml(cbind(m, n - m) ~ seed * root, data = orob2, family = "bb")
anova(fm0, fm1, fm2, fm3)
# LR tests - constant phi
# phi is assumed to be estimated from fm3
fm2.bis <- aodml(cbind(m, n - m) ~ seed + root, data = orob2,
family = "bb", fixpar = list(4, fm3$phi))
LRstat <- 2 * (logLik(fm3) - logLik(fm2.bis))
pchisq(LRstat, df = 1, lower.tail = FALSE)
# Wald test of the seed factor in fm1
wald.test(b = coef(fm3), varb = vcov(fm3), Terms = 4)
#------ Negative binomial model
### Modelling counts
data(salmonella)
fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb")
## fm1 <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb",
## method = "Nelder-Mead")
# summaries
fm1
summary(fm1)
coef(fm1)
vcov(fm1)
logLik(fm1)
deviance(fm1)
AIC(fm1)
gof(fm1)
# predictions
cbind(
fitted(fm1),
fitted(fm1, what = "nu"),
fitted(fm1, what = "eta"),
fitted(fm1, what = "phi")
)
predict(fm1, type = "response", se.fit = TRUE)
newdat <- data.frame(dose = c(20, 40))
predict(fm1, type = "response", se.fit = TRUE, newdata = newdat)
# various phi scales
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella, family = "nb")
fm$phi
fm$phi.scale
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella,
family = "nb", phi.scale = "log")
fm$phi
fm$phi.scale
fm <- aodml(m ~ log(dose + 10) + dose, data = salmonella,
family = "nb", phi.scale = "inverse")
fm$phi
fm$phi.scale
# LR and Wald tests of the "log(dose + 10) + dose" factors
fm0 <- aodml(m ~ 1, data = salmonella, family = "nb")
anova(fm0, fm1)
fm0.bis <- aodml(m ~ 1, data = salmonella, family = "nb",
fixpar = list(2, fm1$phi))
LRstat <- 2 * (logLik(fm1) - logLik(fm0.bis))
pchisq(LRstat, df = 2, lower.tail = FALSE)
wald.test(b = coef(fm1), varb = vcov(fm1), Terms = 2:3)
### Modelling a rate
data(dja)
# rate "m / trisk"
fm <- aodml(formula = m ~ group + offset(log(trisk)),
data = dja, family = "nb")
summary(fm)
fm <- aodml(formula = m ~ group + offset(log(trisk)),
phi.formula = ~ group, data = dja, family = "nb",
phi.scale = "log")
summary(fm)
# model with 1 phi coefficient, set as constant "0.8"
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
family = "nb", phi.formula = ~1, fixpar = list(3, 0.8))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant ~ "0" in the identity scale
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
family = "nb", phi.formula = ~ group, phi.scale = "log",
fixpar = list(4, -15))
fm$param
fm$varparam
# model with 2 phi coefficients, with the first set as constant "0" in the identity scale,
# and the mu intercept coefficient (1rst coef of vector b) set as as constant "-0.5"
fm <- aodml(formula = m ~ group + offset(log(trisk)), data = dja,
family = "nb", phi.formula = ~ group, phi.scale = "log",
fixpar = list(c(1, 4), c(-0.5, -15)))
fm$param
fm$varparam