R: Time series models for mortality and disease incidence

stan_rw {surveil}

R Documentation

Time series models for mortality and disease incidence

Description

Model time-varying incidence rates given a time series of case (or death) counts and population at risk.

Usage

stan_rw(
  data,
  group,
  time,
  cor = FALSE,
  family = poisson(),
  prior = list(),
  chains = 4,
  cores = 1,
  iter = 3000,
  refresh = 1500,
  control = list(adapt_delta = 0.98),
  ...
)

Arguments

`data`	A `data.frame` containing the following columns: Count Number of cases or deaths; this column must be named 'Count'. Population Size of population at risk; this column must be named 'Population'. time Time period indicator. (Provide the (unquoted) column name using the `time` argument.) group Optional grouping variable. (Provide the (unquoted) column name using the `group` argument.)
`group`	If `data` is aggregated by demographic group, provide the (unquoted) name of the column in `data` containing the grouping structure, such as age brackets or race-ethnicity. E.g., if `data` has column names `Year`, `Race`, `Cases`, and `Population`, then you would provide `group = Race`.
`time`	Specify the (unquoted) name of the time variable in `data`, as in `time = Year`. This variable must be numeric-alike (i.e., `as.numeric(data$time)` will not fail).
`cor`	For correlated random walks use `cor = TRUE`; default value is `FALSE`. Note this only applies when the `group` argument is used.
`family`	The default specification is a Poisson model with log link function (`family = poisson()`). For a Binomial model with logit link function, use `family = binomial()`.
`prior`	Optionally provide a named `list` with prior parameters. If any of the following items are missing, default priors will be assigned and printed to the console. eta_1 The first value of log-risk in each series must be assigned a Gaussian prior probability distribution. Provide the location and scale parameters for each demographic group in a list, where each parameter is a `k`-length vector. For example, with `k=2` demographic groups, the following code will assign priors of `normal(-6.5, 5)` to the starting values of both series: `⁠prior = list(eta_1 = normal(location = -6.5, scale = 5, k = 2)⁠`. Note, `eta` is the log-rate, so centering the prior for `eta_1` on `-6.5` is similar to centering the prior rate on `⁠exp(-6.5)*100,000 = 150⁠` cases per 100,000 person-years at risk. Note, however, that the translation from log-rate to rate is non-linear. sigma Each demographic group has a scale parameter assigned to its log-rate. This is the scale of the annual deviations from the previous year's log-rate. The scale parameters are assigned independent half-normal prior distributions (these `half` normal distributions are restricted to be positive-valued only). omega If `cor = TRUE`, an LKJ prior is assigned to the correlation matrix, Omega.
`chains`	Number of independent MCMC chains to initiate (passed to `sampling`).
`cores`	The number of cores to use when executing the Markov chains in parallel (passed to `sampling`).
`iter`	Total number of MCMC iterations. Warmup draws are automatically half of `iter`.
`refresh`	How often to print the MCMC sampling progress to the console.
`control`	A named list of parameters to control Stan's sampling behavior. The most common parameters to control are `adapt_delta`, which may be raised to address divergent transitions, and `max_treedepth`. For example, `control = list(adapt_delta = 0.99, max_treedepth = 13)`, may be a reasonable specification to address a divergent transitions or maximum treedepth warning from Stan.
`...`	Other arguments passed to `sampling`.

Details

By default, the models have Poisson likelihoods for the case counts, with log link function. Alternatively, a Binomial model with logit link function can be specified using the family argument (family = binomial()).

For time t = 1,...n, the models assign Poisson probability distribution to the case counts, given log-risk eta and population at tirks P; the log-risk is modeled using the first-difference (or random-walk) prior:

 y_t ~ Poisson(p_t * exp(eta_t))
 eta_t ~ Normal(eta_{t-1}, sigma)
 eta_1 ~ Normal(-6, 5) (-Inf, 0)
 sigma ~ Normal(0, 1) (0, Inf)

This style of model has been discussed in Bayesian (bio)statistics for quite some time. See Clayton (1996).

The above model can be used for multiple distinct groups; in that case, each group will have its own independent time series model.

It is also possible to add a correlation structure to that set of models. Let Y_t be a k-length vector of observations for each of k groups at time t (the capital letter now indicates a vector), then:

 Y_t ~ Poisson(P_t * exp(Eta_t))
 Eta_t ~ MVNormal(Eta_{t-1}, Sigma)
 Eta_1 ~ Normal(-6, 5)  (-Inf, 0)
 Sigma = diag(sigma) * Omega * diag(sigma)
 Omega ~ LKJ(2)
 sigma ~ Normal(0, 1) (0, Inf)

where Omega is a correlation matrix and diag(sigma) is a diagonal matrix with scale parameters on the diagonal. This was adopted from Brandt and Williams (2007); for the LKJ prior, see the Stan Users Guide and Reference Manual.

If the binomial model is used instead of the Poisson, then the first line of the model specifications will be:

 y_t ~ binomial(P_t, inverse_logit(eta_t))

All else is remains the same. The logit function is log(r/(1-r)), where r is a rate between zero and one; the inverse-logit function is exp(x)/(1 + exp(x)).

Value

The function returns a list, also of class surveil, containing the following elements:

summary: A data.frame with posterior means and 95 percent credible intervals, as well as the raw data (Count, Population, time period, grouping variable if any, and crude rates).
samples: A stanfit object returned by sampling. This contains the MCMC samples from the posterior distribution of the fitted model.
cor: Logical value indicating if the model included a correlation structure.
time: A list containing the name of the time-period column in the user-provided data and a data.frame of observed time periods and their index.
group: If a grouping variable was used, this will be a list containing the name of the grouping variable and a data.frame with group labels and index values.
family: The user-provided family argument.

Author(s)

Connor Donegan (Connor.Donegan@UTSouthwestern.edu)

Source

Brandt P and Williams JT. Multiple time series models. Thousand Oaks, CA: SAGE Publications, 2007.

Clayton, DG. Generalized linear mixed models. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in Practice: Interdisciplinary Statistics. Boca Raton, FL: CRC Press, 1996. p. 275-302.

Donegan C, Hughes AE, and Lee SC (2022). Colorectal Cancer Incidence, Inequalities, and Prevention Priorities in Urban Texas: Surveillance Study With the "surveil" Software Package. JMIR Public Health & Surveillance 8(8):e34589. doi:10.2196/34589

Stan Development Team. Stan Modeling Language Users Guide and Reference Manual, 2.28. 2021. https://mc-stan.org

Examples

data(msa)
dat <- aggregate(cbind(Count, Population) ~ Year, data = msa, FUN = sum)

fit <- stan_rw(dat, time = Year)

## print summary of results
print(fit)
print(fit$summary)

## plot time trends (rates per 10,000)
plot(fit, scale = 10e3)
plot(fit, style = 'lines', scale = 10e3)

## Summary with MCMC diagnostics (n_eff, Rhat; from Rstan)
print(fit$samples)

## cumulative percent change
fit_pc <- apc(fit)
print(fit_pc$cpc)
plot(fit_pc, cumulative = TRUE)


## age-specific rates 
data(cancer)
cancer2 <- subset(cancer, grepl("55-59|60-64|65-69", Age))
fit <- stan_rw(cancer2, time = Year, group = Age,
               chains = 3, iter = 1e3) # for speed only

## plot trends 
plot(fit, scale = 10e3)

## age-standardized rates
data(standard)
fit_stands <- standardize(fit,
                          label = standard$age,
                          standard_pop = standard$standard_pop)
print(fit_stands)
plot(fit_stands)

## percent change for age-standardized rates
fit_stands_apc <- apc(fit_stands)
plot(fit_stands_apc)
print(fit_stands_apc)

[Package surveil version 0.3.0 Index]