rts2-package {rts2} | R Documentation |
Real-Time Disease Surveillance
Description
Supports modelling real-time case data to facilitate the real-time
surveillance of infectious diseases and other point phenomena. The package provides automated computational grid generation over
an area of interest with methods to map covariates between geographies, model fitting including spatially aggregated case counts,
and predictions and visualisation. Both Bayesian and maximum likelihood methods are provided. Log-Gaussian Cox Processes are described by
Diggle et al. (2013) <doi:10.1214/13-STS441> and we provide both the low-rank approximation for Gaussian processes
described by Solin and Särkkä (2020) <doi:10.1007/s11222-019-09886-w> and Riutort-Mayol et al (2023) <doi:10.1007/s11222-022-10167-2> and the
nearest neighbour Gaussian process described by Datta et al (2016) <doi:10.1080/01621459.2015.1044091>. 'cmdstanr' can be downloaded at <https://mc-stan.org/cmdstanr/>.
rts2
provides several estimators for the Log Gaussian Cox Process Model (LGCP). The LGCP is a stochastic Poisson model used for modelling case counts of phenomena of interest, and is particularly useful for predicting risk across an area of interest, such as in disease surveillance applications.
Workflow
Most of the functionality of the rts2 package is provided by the grid
class. The computational strategy for the LGCP is to divide up the area of interest into a regular grid and aggregate case counts within cells. For models with count data aggregated to an irregular set of polygons, such as census tracts, the latent surface is also modelled as a regular grid. A typical workflow using this package would be:
Create a new
grid
object, e.g.g1 <- grid$new(poly, cellsize = 0.1)
. The class is initialized with either a single polygon describing the area of interest or a collection of polygons if spatially aggregated data are used. The sf package is used for all spatial data.If the location (and times) of cases are available (i.e. the data are not spatially aggregated), then we map the points to the computational grid. The function create_points can generate point data in the correct sf format. The member function
points_to_grid
will then map these data to the grid. Counts can also be manually added to grid data. For region data, since the counts are assumed to be already aggregated, these must be manually provided by the user. The case counts must appear in columns with specific names. If there is only a single time period then the counts must be in a column namedy
. If there are multiple time periods then the counts must be in columns namest1
,t2
,t3
,... Associated columns labelleddate1
,date2
, etc. will permit use of some functionality regarding specific time intervals.If any covariates are to be used for the modelling, then these can be mapped to the compuational grid using the function
add_covariates()
. Other functions,add_time_indicators()
andget_dow()
will also generate relevant temporal indicators where required. At a minimum we would recommend including a measure of population density.Fit a model. There are multiple methods for model fitting, which are available through the member functions
lgcp_ml()
andlgcp_bayes()
for maximum likelihood and Bayesian approaches, respectively. The results are stored internally and optionally returned as artsFit
object.Summarise the output. The main functions for summarising the output are
extract_preds()
, which will generate predictions of relative risk, incidence rate ratios, and predicted incidence, andhotspots()
, which will estimate probabilities that these statistics exceed given thresholds. For spatially-aggregated data models, the relative risk applies to the grid, whereas rate ratios and predicted incidence applies to the areas.Predictions can be visualised or aggregated to relevant geographies with the
plot()
andaggregate()
functions.
Estimation methods and model specification
The rts2 package provide several methods for estimating the model and drawing samples from the latent surface.
Maximum Likelihood. We include stochastic maximum likelihood estimation methods including both Markov Chain Monte Carlo (MCMC) Maximum Likelihood and Stochastic Approximation Expectation Maximisation (SAEM). MCMC-ML can use Newton-Raphson, quasi-Newton, or derivative free methods to estimate the model parameters. Both algorithms have three steps: 1. Sample the random effects using MCMC; 2. Estimate the fixed effect parameters conditional on the sampled random effects; 3. Estimate the covariance parameters. The process is iterated until convergence. Stochastic maximum likelihood estimators are provided by the function
grid$lgcp_ml()
.Bayesian. We also include Bayesian estimation of the model using Stan via either rstan or cmdstanr, and allow both MCMC and Variational Bayes methods.
The LGCP can be computationally complex and scales poorly with sample size (number of grid cells and time periods), due to the large covariance matrix that must be inverted to estimate the covariance parameters. We offer several strategies and approximations for efficient model fitting:
Gaussian Process Approximations. The package includes both Hilbert Space Gaussian Process (see Solin and Särkkä (2020) <doi:10.1007/s11222-019-09886-w> and Riutort-Mayol et al (2020) <arXiv:2004.11408>) and the Nearest Neighbour Gaussian Process (Datta et al (2016) <doi:10.1080/01621459.2015.1044091>).
For spatio-temporal models we use a "spatial innovation" formulation of the spatio-temporal Gaussian process, for which the computational complexity is linear in the number of time periods.
Package development
The package is still in development and there may still be bugs and errors. While we do not expect the general user interface to change there may be changes to the underlying library as well as new additions and functionality.
Author(s)
Sam Watson [aut, cre] (<https://orcid.org/0000-0002-8972-769X>)
Maintainer: Sam Watson <s.i.watson@bham.ac.uk>
References
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org