hhh4 {surveillance} | R Documentation |
Fitting HHH Models with Random Effects and Neighbourhood Structure
Description
Fits an autoregressive Poisson or negative binomial model
to a univariate or multivariate time series of counts.
The characteristic feature of hhh4
models is the additive
decomposition of the conditional mean into epidemic and
endemic components (Held et al, 2005).
Log-linear predictors of covariates and random intercepts are allowed
in all components; see the Details below.
A general introduction to the hhh4
modelling approach and its
implementation is given in the vignette("hhh4")
. Meyer et al
(2017, Section 5, available as vignette("hhh4_spacetime")
)
describe hhh4
models for areal time series of infectious
disease counts.
Usage
hhh4(stsObj,
control = list(
ar = list(f = ~ -1, offset = 1, lag = 1),
ne = list(f = ~ -1, offset = 1, lag = 1,
weights = neighbourhood(stsObj) == 1,
scale = NULL, normalize = FALSE),
end = list(f = ~ 1, offset = 1),
family = c("Poisson", "NegBin1", "NegBinM"),
subset = 2:nrow(stsObj),
optimizer = list(stop = list(tol=1e-5, niter=100),
regression = list(method="nlminb"),
variance = list(method="nlminb")),
verbose = FALSE,
start = list(fixed=NULL, random=NULL, sd.corr=NULL),
data = list(t = stsObj@epoch - min(stsObj@epoch)),
keep.terms = FALSE
),
check.analyticals = FALSE)
Arguments
stsObj |
object of class |
control |
a list containing the model specification and control arguments:
The auxiliary function |
check.analyticals |
logical (or a subset of
|
Details
An endemic-epidemic multivariate time-series model for infectious
disease counts Y_{it}
from units i=1,\dots,I
during
periods t=1,\dots,T
was proposed by Held et al (2005) and was
later extended in a series of papers (Paul et al, 2008; Paul and Held,
2011; Held and Paul, 2012; Meyer and Held, 2014).
In its most general formulation, this so-called hhh4
(or HHH or
H^3
or triple-H) model assumes that, conditional on past
observations, Y_{it}
has a Poisson or negative binomial
distribution with mean
\mu_{it} = \lambda_{it} y_{i,t-1} +
\phi_{it} \sum_{j\neq i} w_{ji} y_{j,t-1} +
e_{it} \nu_{it}
In the case of a negative binomial model, the conditional
variance is \mu_{it}(1+\psi_i\mu_{it})
with overdispersion parameters \psi_i > 0
(possibly shared
across different units, e.g., \psi_i\equiv\psi
).
Univariate time series of counts Y_t
are supported as well, in
which case hhh4
can be regarded as an extension of
glm.nb
to account for autoregression.
See the Examples below for a comparison of an endemic-only
hhh4
model with a corresponding glm.nb
.
The three unknown quantities of the mean \mu_{it}
,
-
\lambda_{it}
in the autoregressive (ar
) component, -
\phi_{it}
in the neighbour-driven (ne
) component, and -
\nu_{it}
in the endemic (end
) component,
are log-linear predictors incorporating time-/unit-specific
covariates. They may also contain unit-specific random intercepts
as proposed by Paul and Held (2011). The endemic mean is usually
modelled proportional to a unit-specific offset e_{it}
(e.g., population numbers or fractions); it is possible to include
such multiplicative offsets in the epidemic components as well.
The w_{ji}
are transmission weights reflecting the flow of
infections from unit j
to unit i
. If weights vary over time
(prespecified as a 3-dimensional array (w_{jit})
), the
ne
sum in the mean uses w_{jit} y_{j,t-1}
.
In spatial hhh4
applications, the “units” refer to
geographical regions and the weights could be derived from movement
network data. Alternatively, the weights w_{ji}
can be
estimated parametrically as a function of adjacency order (Meyer and
Held, 2014), see W_powerlaw
.
(Penalized) Likelihood inference for such hhh4
models has been
established by Paul and Held (2011) with extensions for parametric
neighbourhood weights by Meyer and Held (2014).
Supplied with the analytical score function and Fisher information,
the function hhh4
by default uses the quasi-Newton algorithm
available through nlminb
to maximize the log-likelihood.
Convergence is usually fast even for a large number of parameters.
If the model contains random effects, the penalized and marginal
log-likelihoods are maximized alternately until convergence.
Value
hhh4
returns an object of class "hhh4"
,
which is a list containing the following components:
coefficients |
named vector with estimated (regression) parameters of the model |
se |
estimated standard errors (for regression parameters) |
cov |
covariance matrix (for regression parameters) |
Sigma |
estimated variance-covariance matrix of random effects |
Sigma.orig |
estimated variance parameters on internal scale used for optimization |
Sigma.cov |
inverse of marginal Fisher information (on internal
scale), i.e., the asymptotic covariance matrix of |
call |
the matched call |
dim |
vector with number of fixed and random effects in the model |
loglikelihood |
(penalized) loglikelihood evaluated at the MLE |
margll |
(approximate) log marginal likelihood should the model contain random effects |
convergence |
logical. Did optimizer converge? |
fitted.values |
fitted mean values |
control |
control object of the fit |
terms |
the terms object used in the fit if |
stsObj |
the supplied |
lags |
named integer vector of length two containing the lags
used for the epidemic components |
nObs |
number of observations used for fitting the model |
nTime |
number of time points used for fitting the model |
nUnit |
number of units (e.g. areas) used for fitting the model |
runtime |
the |
Author(s)
Michaela Paul, Sebastian Meyer, Leonhard Held
References
Held, L., Höhle, M. and Hofmann, M. (2005): A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling, 5 (3), 187-199. doi:10.1191/1471082X05st098oa
Paul, M., Held, L. and Toschke, A. M. (2008): Multivariate modelling of infectious disease surveillance data. Statistics in Medicine, 27 (29), 6250-6267. doi:10.1002/sim.4177
Paul, M. and Held, L. (2011): Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine, 30 (10), 1118-1136. doi:10.1002/sim.4177
Held, L. and Paul, M. (2012): Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal, 54 (6), 824-843. doi:10.1002/bimj.201200037
Meyer, S. and Held, L. (2014): Power-law models for infectious disease spread. The Annals of Applied Statistics, 8 (3), 1612-1639. doi:10.1214/14-AOAS743
Meyer, S., Held, L. and Höhle, M. (2017): Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software, 77 (11), 1-55. doi:10.18637/jss.v077.i11
See Also
See the special functions fe
, ri
and the
examples below for how to specify unit-specific effects.
Further details on the modelling approach and illustrations of its
implementation can be found in vignette("hhh4")
and
vignette("hhh4_spacetime")
.
Examples
######################
## Univariate examples
######################
### weekly counts of salmonella agona cases, UK, 1990-1995
data("salmonella.agona")
## convert old "disProg" to new "sts" data class
salmonella <- disProg2sts(salmonella.agona)
salmonella
plot(salmonella)
## generate formula for an (endemic) time trend and seasonality
f.end <- addSeason2formula(f = ~1 + t, S = 1, period = 52)
f.end
## specify a simple autoregressive negative binomial model
model1 <- list(ar = list(f = ~1), end = list(f = f.end), family = "NegBin1")
## fit this model to the data
res <- hhh4(salmonella, model1)
## summarize the model fit
summary(res, idx2Exp=1, amplitudeShift=TRUE, maxEV=TRUE)
plot(res)
plot(res, type = "season", components = "end")
### weekly counts of meningococcal infections, Germany, 2001-2006
data("influMen")
fluMen <- disProg2sts(influMen)
meningo <- fluMen[, "meningococcus"]
meningo
plot(meningo)
## again a simple autoregressive NegBin model with endemic seasonality
meningoFit <- hhh4(stsObj = meningo, control = list(
ar = list(f = ~1),
end = list(f = addSeason2formula(f = ~1, S = 1, period = 52)),
family = "NegBin1"
))
summary(meningoFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)
plot(meningoFit)
plot(meningoFit, type = "season", components = "end")
########################
## Multivariate examples
########################
### bivariate analysis of influenza and meningococcal infections
### (see Paul et al, 2008)
plot(fluMen, same.scale = FALSE)
## Fit a negative binomial model with
## - autoregressive component: disease-specific intercepts
## - neighbour-driven component: only transmission from flu to men
## - endemic component: S=3 and S=1 sine/cosine pairs for flu and men, respectively
## - disease-specific overdispersion
WfluMen <- neighbourhood(fluMen)
WfluMen["meningococcus","influenza"] <- 0
WfluMen
f.end_fluMen <- addSeason2formula(f = ~ -1 + fe(1, which = c(TRUE, TRUE)),
S = c(3, 1), period = 52)
f.end_fluMen
fluMenFit <- hhh4(fluMen, control = list(
ar = list(f = ~ -1 + fe(1, unitSpecific = TRUE)),
ne = list(f = ~ 1, weights = WfluMen),
end = list(f = f.end_fluMen),
family = "NegBinM"))
summary(fluMenFit, idx2Exp=1:3)
plot(fluMenFit, type = "season", components = "end", unit = 1)
plot(fluMenFit, type = "season", components = "end", unit = 2)
### weekly counts of measles, Weser-Ems region of Lower Saxony, Germany
data("measlesWeserEms")
measlesWeserEms
plot(measlesWeserEms) # note the two districts with zero cases
## we could fit the same simple model as for the salmonella cases above
model1 <- list(
ar = list(f = ~1),
end = list(f = addSeason2formula(~1 + t, period = 52)),
family = "NegBin1"
)
measlesFit <- hhh4(measlesWeserEms, model1)
summary(measlesFit, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)
## but we should probably at least use a population offset in the endemic
## component to reflect heterogeneous incidence levels of the districts,
## and account for spatial dependence (here just using first-order adjacency)
measlesFit2 <- update(measlesFit,
end = list(offset = population(measlesWeserEms)),
ne = list(f = ~1, weights = neighbourhood(measlesWeserEms) == 1))
summary(measlesFit2, idx2Exp=TRUE, amplitudeShift=TRUE, maxEV=TRUE)
plot(measlesFit2, units = NULL, hide0s = TRUE)
## 'measlesFit2' corresponds to the 'measlesFit_basic' model in
## vignette("hhh4_spacetime"). See there for further analyses,
## including vaccination coverage as a covariate,
## spatial power-law weights, and random intercepts.
## Not run:
### last but not least, a more sophisticated (and time-consuming)
### analysis of weekly counts of influenza from 140 districts in
### Southern Germany (originally analysed by Paul and Held, 2011,
### and revisited by Held and Paul, 2012, and Meyer and Held, 2014)
data("fluBYBW")
plot(fluBYBW, type = observed ~ time)
plot(fluBYBW, type = observed ~ unit,
## mean yearly incidence per 100.000 inhabitants (8 years)
population = fluBYBW@map$X31_12_01 / 100000 * 8)
## For the full set of models for data("fluBYBW") as analysed by
## Paul and Held (2011), including predictive model assessement
## using proper scoring rules, see the (computer-intensive)
## demo("fluBYBW") script:
demoscript <- system.file("demo", "fluBYBW.R", package = "surveillance")
demoscript
#file.show(demoscript)
## Here we fit the improved power-law model of Meyer and Held (2014)
## - autoregressive component: random intercepts + S = 1 sine/cosine pair
## - neighbour-driven component: random intercepts + S = 1 sine/cosine pair
## + population gravity with normalized power-law weights
## - endemic component: random intercepts + trend + S = 3 sine/cosine pairs
## - random intercepts are iid but correlated between components
f.S1 <- addSeason2formula(
~-1 + ri(type="iid", corr="all"),
S = 1, period = 52)
f.end.S3 <- addSeason2formula(
~-1 + ri(type="iid", corr="all") + I((t-208)/100),
S = 3, period = 52)
## for power-law weights, we need adjaceny orders, which can be
## computed from the binary adjacency indicator matrix
nbOrder1 <- neighbourhood(fluBYBW)
neighbourhood(fluBYBW) <- nbOrder(nbOrder1)
## full model specification
fluModel <- list(
ar = list(f = f.S1),
ne = list(f = update.formula(f.S1, ~ . + log(pop)),
weights = W_powerlaw(maxlag=max(neighbourhood(fluBYBW)),
normalize = TRUE, log = TRUE)),
end = list(f = f.end.S3, offset = population(fluBYBW)),
family = "NegBin1", data = list(pop = population(fluBYBW)),
optimizer = list(variance = list(method = "Nelder-Mead")),
verbose = TRUE)
## CAVE: random effects considerably increase the runtime of model estimation
## (It is usually advantageous to first fit a model with simple intercepts
## to obtain reasonable start values for the other parameters.)
set.seed(1) # because random intercepts are initialized randomly
fluFit <- hhh4(fluBYBW, fluModel)
summary(fluFit, idx2Exp = TRUE, amplitudeShift = TRUE)
plot(fluFit, type = "fitted", total = TRUE)
plot(fluFit, type = "season")
range(plot(fluFit, type = "maxEV"))
plot(fluFit, type = "maps", prop = TRUE)
gridExtra::grid.arrange(
grobs = lapply(c("ar", "ne", "end"), function (comp)
plot(fluFit, type = "ri", component = comp, main = comp,
exp = TRUE, sub = "multiplicative effect")),
nrow = 1, ncol = 3)
plot(fluFit, type = "neweights", xlab = "adjacency order")
## End(Not run)
########################################################################
## An endemic-only "hhh4" model can also be estimated using MASS::glm.nb
########################################################################
## weekly counts of measles, Weser-Ems region of Lower Saxony, Germany
data("measlesWeserEms")
## fit an endemic-only "hhh4" model
## with time covariates and a district-specific offset
hhh4fit <- hhh4(measlesWeserEms, control = list(
end = list(f = addSeason2formula(~1 + t, period = measlesWeserEms@freq),
offset = population(measlesWeserEms)),
ar = list(f = ~-1), ne = list(f = ~-1), family = "NegBin1",
subset = 1:nrow(measlesWeserEms)
))
summary(hhh4fit)
## fit the same model using MASS::glm.nb
measlesWeserEmsData <- as.data.frame(measlesWeserEms, tidy = TRUE)
measlesWeserEmsData$t <- c(hhh4fit$control$data$t)
glmnbfit <- MASS::glm.nb(
update(formula(hhh4fit)$end, observed ~ . + offset(log(population))),
data = measlesWeserEmsData
)
summary(glmnbfit)
## Note that the overdispersion parameter is parametrized inversely.
## The likelihood and point estimates are all the same.
## However, the variance estimates are different: in glm.nb, the parameters
## are estimated conditional on the overdispersion theta.