NRepeat {stppSim} | R Documentation |
Near Repeat calculator using the Knox test
Description
This function uses the Knox test for space-time clustering to quantify the spatio-temporal association between events (Credit: Wouter Steenbeek).
Usage
NRepeat(x, y, time, sds, tds, s_include.lowest = FALSE,
s_right = FALSE, t_include.lowest = FALSE, t_right = FALSE,
method = "manhattan", nrep = 999, saveSimulations = FALSE,
future.seed = TRUE,...)
Arguments
x |
a vector of x coordinates |
y |
a vector of y coordinates |
time |
a vector of time. This can be of type integer, numeric, or date |
sds |
A vector of break points of the spatial intervals. For example c(0,50,120,300) to specify spatial intervals from 0-50, 50-120, 120-300 meters. Or c(0,50,100,Inf) to specify spatial intervals from 0-50, 50-100, and 100-Inf meters. (More accurately, on the scale of the provided x and y coordinates. For example, data may be projected in feet and thus the distances refer to feet instead of meters). |
tds |
A vector of break points of the temporal intervals. For example c(0,2,4,Inf) to specify temporal intervals from 0-2, 2-4, 4-Inf days. |
s_include.lowest |
the descriptions above are ambiguous on how exactly the spatial break points are handled. For example, does c(0,100,200) refer to 0-100, 101-200? Or to 0-99 and 100-199? s_include.lowest follows the arguments of cut (see ?cut). Logical, indicating if a spatial distance equal to the lowest (or highest, for right = FALSE) 'breaks' value should be included. Default = FALSE. See vignette("NearRepeat_breaks") for details. |
s_right |
logical, indicating if the spatial intervals should be closed on the right (and open on the left) or vice versa. Default = FALSE. See vignette("NearRepeat_breaks") for details. |
t_include.lowest |
t_include.lowest follows the arguments of cut (see ?cut). Logical, indicating if a temporal distance equal to the lowest (or highest, for right = FALSE) 'breaks' value should be included. Default = FALSE. |
t_right |
logical, indicating if the temporal intervals should be closed on the right (and open on the left) or vice versa. Default = FALSE. See vignette("NearRepeat_breaks") for details. |
method |
The method to calculate the spatial distances between crime events. Methods possible as in the 'dist' function (see ?dist). Default is 'manhattan', which seems to be a fair approximation of the distance travelled by a road network. Alternatively, the user can specify 'euclidean' to get the 'as the crow flies' distance. |
nrep |
The number of replications of the Monte Carlo simulation (default = 999). |
saveSimulations |
Should all simulated contingency tables be saved as a 3-dimensional array? Default = FALSE |
future.seed |
A logical or an integer (of length one or seven), or a list of length(X) with pre-generated random seeds. Default = TRUE. See R package future.apply for details. |
... |
(optional) Additional arguments passed to future_lapply() |
Details
Further details available at: https://github.com/wsteenbeek/NearRepeat.
Value
An object of type "knox", i.e. a list with four tables. For each spatial and temporal distance combination,(1) The counts of observed crime pairs, (2) The Knox ratios based on the mean of the simulations, (3) The Knox ratios based on the median of the simulations, (4) p-values.
References
Steenbeek W. Near Repeat. R package version 0.1.1. 2018. URL: https://github.com/wsteenbeek/NearRepeat
Examples
## Not run:
# Generate example data. Suppose x and y refer to meters distance.
set.seed(10)
(mydata <- data.frame(x = sample(x = 20, size = 20, replace = TRUE) * 20,
y = sample(x = 20, size = 20, replace = TRUE) * 20,
date = as.Date(sort(sample(20, size = 20, replace = TRUE)),
origin = "2018-01-01")
))
# Near Repeat calculation using 0-100 meters and 100-Inf meters, and three
# temporal intervals of 2 days
set.seed(38673)
NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
sds = c(0,100,Inf), tds = c(0,2,4))
# Add a 'same repeat' spatial interval of 0.001 meters, and use Euclidean
# distance
set.seed(38673)
NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
sds = c(0,0.001,100,Inf), tds = c(0,2,4),
method = "euclidean")
# Only do 99 replications
set.seed(38673)
NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
sds = c(0,0.001,100,Inf), tds = c(0,2,4),
method = "euclidean", nrep = 99)
# The plot() function can be used to plot a Heat Map of Near Repeat results
# based on p-values
set.seed(4622)
myoutput <- NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
sds = c(0,100,200,300,400), td = c(0,1,2,3,4,5))
# The default range of p-values that will be highlighted (0-.05) can be
# adjusted using the 'pvalue_range' parameter. By default the Knox ratios
# are printed in the cells, but this can be adjusted using the 'text'
# parameter. The default is "knox_ratio". Possible values are "observed",
# "knox_ratio", "knox_ratio_median", "pvalues", or NA.
## End(Not run)