stan_rw {surveil} | R Documentation |
Time series models for mortality and disease incidence
Description
Model time-varying incidence rates given a time series of case (or death) counts and population at risk.
Usage
stan_rw(
data,
group,
time,
cor = FALSE,
family = poisson(),
prior = list(),
chains = 4,
cores = 1,
iter = 3000,
refresh = 1500,
control = list(adapt_delta = 0.98),
...
)
Arguments
data |
A
|
group |
If |
time |
Specify the (unquoted) name of the time variable in |
cor |
For correlated random walks use |
family |
The default specification is a Poisson model with log link function ( |
prior |
Optionally provide a named
|
chains |
Number of independent MCMC chains to initiate (passed to |
cores |
The number of cores to use when executing the Markov chains in parallel (passed to |
iter |
Total number of MCMC iterations. Warmup draws are automatically half of |
refresh |
How often to print the MCMC sampling progress to the console. |
control |
A named list of parameters to control Stan's sampling behavior. The most common parameters to control are |
... |
Other arguments passed to |
Details
By default, the models have Poisson likelihoods for the case counts, with log link function. Alternatively, a Binomial model with logit link function can be specified using the family
argument (family = binomial()
).
For time t = 1,...n, the models assign Poisson probability distribution to the case counts, given log-risk eta
and population at tirks P; the log-risk is modeled using the first-difference (or random-walk) prior:
y_t ~ Poisson(p_t * exp(eta_t)) eta_t ~ Normal(eta_{t-1}, sigma) eta_1 ~ Normal(-6, 5) (-Inf, 0) sigma ~ Normal(0, 1) (0, Inf)
This style of model has been discussed in Bayesian (bio)statistics for quite some time. See Clayton (1996).
The above model can be used for multiple distinct groups; in that case, each group will have its own independent time series model.
It is also possible to add a correlation structure to that set of models. Let Y_t
be a k-length vector of observations for each of k groups at time t (the capital letter now indicates a vector), then:
Y_t ~ Poisson(P_t * exp(Eta_t)) Eta_t ~ MVNormal(Eta_{t-1}, Sigma) Eta_1 ~ Normal(-6, 5) (-Inf, 0) Sigma = diag(sigma) * Omega * diag(sigma) Omega ~ LKJ(2) sigma ~ Normal(0, 1) (0, Inf)
where Omega
is a correlation matrix and diag(sigma)
is a diagonal matrix with scale parameters on the diagonal. This was adopted from Brandt and Williams (2007); for the LKJ prior, see the Stan Users Guide and Reference Manual.
If the binomial model is used instead of the Poisson, then the first line of the model specifications will be:
y_t ~ binomial(P_t, inverse_logit(eta_t))
All else is remains the same. The logit function is log(r/(1-r))
, where r
is a rate between zero and one; the inverse-logit function is exp(x)/(1 + exp(x))
.
Value
The function returns a list, also of class surveil
, containing the following elements:
- summary
A
data.frame
with posterior means and 95 percent credible intervals, as well as the raw data (Count, Population, time period, grouping variable if any, and crude rates).- samples
A
stanfit
object returned bysampling
. This contains the MCMC samples from the posterior distribution of the fitted model.- cor
Logical value indicating if the model included a correlation structure.
- time
A list containing the name of the time-period column in the user-provided data and a
data.frame
of observed time periods and their index.- group
If a grouping variable was used, this will be a list containing the name of the grouping variable and a
data.frame
with group labels and index values.- family
The user-provided
family
argument.
Author(s)
Connor Donegan (Connor.Donegan@UTSouthwestern.edu)
Source
Brandt P and Williams JT. Multiple time series models. Thousand Oaks, CA: SAGE Publications, 2007.
Clayton, DG. Generalized linear mixed models. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in Practice: Interdisciplinary Statistics. Boca Raton, FL: CRC Press, 1996. p. 275-302.
Donegan C, Hughes AE, and Lee SC (2022). Colorectal Cancer Incidence, Inequalities, and Prevention Priorities in Urban Texas: Surveillance Study With the "surveil" Software Package. JMIR Public Health & Surveillance 8(8):e34589. doi:10.2196/34589
Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, 2.28. 2021. https://mc-stan.org
See Also
vignette("demonstration", package = "surveil")
vignette("age-standardization", package = "surveil")
apc
standardize
Examples
data(msa)
dat <- aggregate(cbind(Count, Population) ~ Year, data = msa, FUN = sum)
fit <- stan_rw(dat, time = Year)
## print summary of results
print(fit)
print(fit$summary)
## plot time trends (rates per 10,000)
plot(fit, scale = 10e3)
plot(fit, style = 'lines', scale = 10e3)
## Summary with MCMC diagnostics (n_eff, Rhat; from Rstan)
print(fit$samples)
## cumulative percent change
fit_pc <- apc(fit)
print(fit_pc$cpc)
plot(fit_pc, cumulative = TRUE)
## age-specific rates
data(cancer)
cancer2 <- subset(cancer, grepl("55-59|60-64|65-69", Age))
fit <- stan_rw(cancer2, time = Year, group = Age,
chains = 3, iter = 1e3) # for speed only
## plot trends
plot(fit, scale = 10e3)
## age-standardized rates
data(standard)
fit_stands <- standardize(fit,
label = standard$age,
standard_pop = standard$standard_pop)
print(fit_stands)
plot(fit_stands)
## percent change for age-standardized rates
fit_stands_apc <- apc(fit_stands)
plot(fit_stands_apc)
print(fit_stands_apc)