R: Calculate the negative binomial bayesian scan statistic..

scan_bayes_negbin {scanstatistics}

R Documentation

Calculate the negative binomial bayesian scan statistic..

Description

Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006), adapted to a spatio-temporal setting. The scan statistic assumes that, given the relative risk, the data follows a Poisson distribution. The relative risk is in turn assigned a Gamma distribution prior, yielding a negative binomial marginal distribution for the counts under the null hypothesis. Under the alternative hypothesis, the

Usage

scan_bayes_negbin(
  counts,
  zones,
  baselines = NULL,
  population = NULL,
  outbreak_prob = 0.05,
  alpha_null = 1,
  beta_null = 1,
  alpha_alt = alpha_null,
  beta_alt = beta_null,
  inc_values = seq(1, 3, by = 0.1),
  inc_probs = 1
)

Arguments

`counts`	Either: A matrix of observed counts. Rows indicate time and are ordered from least recent (row 1) to most recent (row `nrow(counts)`). Columns indicate locations, numbered from 1 and up. If `counts` is a matrix, the optional matrix argument `baselines` should also be specified. A data frame with columns "time", "location", "count", "baseline". Alternatively, the column "baseline" can be replaced by a column "population". The baselines are the expected values of the counts.
`zones`	A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone.
`baselines`	Optional. A matrix of the same dimensions as `counts`. Not needed if `counts` is a data frame. Holds the Poisson mean parameter for each observed count. Will be estimated if not supplied (requires the `population` argument). These parameters are typically estimated from past data using e.g. Poisson (GLM) regression.
`population`	Optional. A matrix or vector of populations for each location. Not needed if `counts` is a data frame. If `counts` is a matrix, `population` is only needed if `baselines` are to be estimated and you want to account for the different populations in each location (and time). If a matrix, should be of the same dimensions as `counts`. If a vector, should be of the same length as the number of columns in `counts`.
`outbreak_prob`	A scalar; the probability of an outbreak (at any time, any place). Defaults to 0.05.
`alpha_null`	A scalar; the shape parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1.
`beta_null`	A scalar; the scale parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1.
`alpha_alt`	A scalar; the shape parameter for the gamma distribution under the alternative hypothesis of an anomaly. Defaults to the same value as `alpha_null`.
`beta_alt`	A scalar; the scale parameter for the gamma distribution under the alternative hypothesis of an anomaly. Defaults to the same value as `beta_null`.
`inc_values`	A vector of possible values for the increase in the mean (and variance) of an anomalous count. Defaults to evenly spaced values between 1 and 3, with a difference of 0.1 between consecutive values.
`inc_probs`	A vector of the prior probabilities of each value in `inc_values`. Defaults to 1, implying a discrete uniform distribution.

Value

A list which, in addition to the information about the type of scan statistic, has the following components: priors (list), posteriors (list), MLC (list) and marginal_data_prob (scalar). The list MLC has elements

zone: The number of the spatial zone of the most likely cluster (MLC).
duration: The most likely event duration.
log_posterior: The posterior log probability that an event is ongoing in the MLC.
log_bayes_factor: The logarithm of the Bayes factor for the MLC.
posterior: The posterior probability that an event is ongoing in the MLC.
locations: The locations involved in the MLC.

The list priors has elements

null_prior: The prior probability of no anomaly.
alt_prior: The prior probability of an anomaly.
inc_prior: A vectorof prior probabilities of each value in the argument inc_values.
window_prior: The prior probability of an outbreak in any of the space-time windows.

The list posteriors has elements

null_posterior: The posterior probability of no anomaly.
alt_posterior: The posterior probability of an anomaly.
inc_posterior: A data frame with columns inc_values and inc_posterior.
window_posteriors: A data frame with columns zone, duration, log_posterior and log_bayes_factor, each row corresponding to a space-time window.
space_time_posteriors: A matrix with the posterior anomaly probability of each location-time combination.
location_posteriors: A vector with the posterior probability of an anomaly at each location.

References

Neill, D. B., Moore, A. W., Cooper, G. F. (2006). A Bayesian Spatial Scan Statistic. Advances in Neural Information Processing Systems 18.

Examples

set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)

# Simulate data
baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(baselines)), max_duration, n_locs)

# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols]), 
  length(ob_rows), length(ob_cols))
res <- scan_bayes_negbin(counts = counts,
                         zones = zones,
                         baselines = baselines)

[Package scanstatistics version 1.1.1 Index]