rts2-package {rts2}R Documentation

Real-Time Disease Surveillance

Description

Supports modelling real-time case data to facilitate the real-time surveillance of infectious diseases and other point phenomena. The package provides automated computational grid generation over an area of interest with methods to map covariates between geographies, model fitting including spatially aggregated case counts, and predictions and visualisation. Both Bayesian and maximum likelihood methods are provided. Log-Gaussian Cox Processes are described by Diggle et al. (2013) <doi:10.1214/13-STS441> and we provide both the low-rank approximation for Gaussian processes described by Solin and Särkkä (2020) <doi:10.1007/s11222-019-09886-w> and Riutort-Mayol et al (2023) <doi:10.1007/s11222-022-10167-2> and the nearest neighbour Gaussian process described by Datta et al (2016) <doi:10.1080/01621459.2015.1044091>. 'cmdstanr' can be downloaded at <https://mc-stan.org/cmdstanr/>. rts2 provides several estimators for the Log Gaussian Cox Process Model (LGCP). The LGCP is a stochastic Poisson model used for modelling case counts of phenomena of interest, and is particularly useful for predicting risk across an area of interest, such as in disease surveillance applications.

Workflow

Most of the functionality of the rts2 package is provided by the grid class. The computational strategy for the LGCP is to divide up the area of interest into a regular grid and aggregate case counts within cells. For models with count data aggregated to an irregular set of polygons, such as census tracts, the latent surface is also modelled as a regular grid. A typical workflow using this package would be:

  1. Create a new grid object, e.g. g1 <- grid$new(poly, cellsize = 0.1). The class is initialized with either a single polygon describing the area of interest or a collection of polygons if spatially aggregated data are used. The sf package is used for all spatial data.

  2. If the location (and times) of cases are available (i.e. the data are not spatially aggregated), then we map the points to the computational grid. The function create_points can generate point data in the correct sf format. The member function points_to_grid will then map these data to the grid. Counts can also be manually added to grid data. For region data, since the counts are assumed to be already aggregated, these must be manually provided by the user. The case counts must appear in columns with specific names. If there is only a single time period then the counts must be in a column named y. If there are multiple time periods then the counts must be in columns names t1, t2, t3,... Associated columns labelled date1, date2, etc. will permit use of some functionality regarding specific time intervals.

  3. If any covariates are to be used for the modelling, then these can be mapped to the compuational grid using the function add_covariates(). Other functions, add_time_indicators() and get_dow() will also generate relevant temporal indicators where required. At a minimum we would recommend including a measure of population density.

  4. Fit a model. There are multiple methods for model fitting, which are available through the member functions lgcp_ml() and lgcp_bayes() for maximum likelihood and Bayesian approaches, respectively. The results are stored internally and optionally returned as a rtsFit object.

  5. Summarise the output. The main functions for summarising the output are extract_preds(), which will generate predictions of relative risk, incidence rate ratios, and predicted incidence, and hotspots(), which will estimate probabilities that these statistics exceed given thresholds. For spatially-aggregated data models, the relative risk applies to the grid, whereas rate ratios and predicted incidence applies to the areas.

  6. Predictions can be visualised or aggregated to relevant geographies with the plot() and aggregate() functions.

Estimation methods and model specification

The rts2 package provide several methods for estimating the model and drawing samples from the latent surface.

The LGCP can be computationally complex and scales poorly with sample size (number of grid cells and time periods), due to the large covariance matrix that must be inverted to estimate the covariance parameters. We offer several strategies and approximations for efficient model fitting:

Package development

The package is still in development and there may still be bugs and errors. While we do not expect the general user interface to change there may be changes to the underlying library as well as new additions and functionality.

Author(s)

Sam Watson [aut, cre] (<https://orcid.org/0000-0002-8972-769X>)

Maintainer: Sam Watson <s.i.watson@bham.ac.uk>

References

Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org


[Package rts2 version 0.7.5 Index]