scan_bayes_negbin {scanstatistics} | R Documentation |
Calculate the negative binomial bayesian scan statistic..
Description
Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006), adapted to a spatio-temporal setting. The scan statistic assumes that, given the relative risk, the data follows a Poisson distribution. The relative risk is in turn assigned a Gamma distribution prior, yielding a negative binomial marginal distribution for the counts under the null hypothesis. Under the alternative hypothesis, the
Usage
scan_bayes_negbin(
counts,
zones,
baselines = NULL,
population = NULL,
outbreak_prob = 0.05,
alpha_null = 1,
beta_null = 1,
alpha_alt = alpha_null,
beta_alt = beta_null,
inc_values = seq(1, 3, by = 0.1),
inc_probs = 1
)
Arguments
counts |
Either:
|
zones |
A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. |
baselines |
Optional. A matrix of the same dimensions as |
population |
Optional. A matrix or vector of populations for each
location. Not needed if |
outbreak_prob |
A scalar; the probability of an outbreak (at any time, any place). Defaults to 0.05. |
alpha_null |
A scalar; the shape parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1. |
beta_null |
A scalar; the scale parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1. |
alpha_alt |
A scalar; the shape parameter for the gamma distribution
under the alternative hypothesis of an anomaly. Defaults to the same value
as |
beta_alt |
A scalar; the scale parameter for the gamma distribution
under the alternative hypothesis of an anomaly. Defaults to the same value
as |
inc_values |
A vector of possible values for the increase in the mean (and variance) of an anomalous count. Defaults to evenly spaced values between 1 and 3, with a difference of 0.1 between consecutive values. |
inc_probs |
A vector of the prior probabilities of each value in
|
Value
A list which, in addition to the information about the type of scan
statistic, has the following components: priors
(list),
posteriors
(list), MLC
(list) and marginal_data_prob
(scalar). The list MLC
has elements
- zone
The number of the spatial zone of the most likely cluster (MLC).
- duration
The most likely event duration.
- log_posterior
The posterior log probability that an event is ongoing in the MLC.
- log_bayes_factor
The logarithm of the Bayes factor for the MLC.
- posterior
The posterior probability that an event is ongoing in the MLC.
- locations
The locations involved in the MLC.
The list priors
has elements
- null_prior
The prior probability of no anomaly.
- alt_prior
The prior probability of an anomaly.
- inc_prior
A vectorof prior probabilities of each value in the argument
inc_values
.- window_prior
The prior probability of an outbreak in any of the space-time windows.
The list posteriors
has elements
- null_posterior
The posterior probability of no anomaly.
- alt_posterior
The posterior probability of an anomaly.
- inc_posterior
A data frame with columns
inc_values
andinc_posterior
.- window_posteriors
A data frame with columns
zone
,duration
,log_posterior
andlog_bayes_factor
, each row corresponding to a space-time window.- space_time_posteriors
A matrix with the posterior anomaly probability of each location-time combination.
- location_posteriors
A vector with the posterior probability of an anomaly at each location.
References
Neill, D. B., Moore, A. W., Cooper, G. F. (2006). A Bayesian Spatial Scan Statistic. Advances in Neural Information Processing Systems 18.
Examples
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(baselines)), max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
rpois(ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols]),
length(ob_rows), length(ob_cols))
res <- scan_bayes_negbin(counts = counts,
zones = zones,
baselines = baselines)