NRepeat {stppSim}R Documentation

Near Repeat calculator using the Knox test

Description

This function uses the Knox test for space-time clustering to quantify the spatio-temporal association between events (Credit: Wouter Steenbeek).

Usage

NRepeat(x, y, time, sds, tds, s_include.lowest = FALSE,
s_right = FALSE, t_include.lowest = FALSE, t_right = FALSE,
method = "manhattan", nrep = 999, saveSimulations = FALSE,
future.seed = TRUE,...)

Arguments

x

a vector of x coordinates

y

a vector of y coordinates

time

a vector of time. This can be of type integer, numeric, or date

sds

A vector of break points of the spatial intervals. For example c(0,50,120,300) to specify spatial intervals from 0-50, 50-120, 120-300 meters. Or c(0,50,100,Inf) to specify spatial intervals from 0-50, 50-100, and 100-Inf meters. (More accurately, on the scale of the provided x and y coordinates. For example, data may be projected in feet and thus the distances refer to feet instead of meters).

tds

A vector of break points of the temporal intervals. For example c(0,2,4,Inf) to specify temporal intervals from 0-2, 2-4, 4-Inf days.

s_include.lowest

the descriptions above are ambiguous on how exactly the spatial break points are handled. For example, does c(0,100,200) refer to 0-100, 101-200? Or to 0-99 and 100-199? s_include.lowest follows the arguments of cut (see ?cut). Logical, indicating if a spatial distance equal to the lowest (or highest, for right = FALSE) 'breaks' value should be included. Default = FALSE. See vignette("NearRepeat_breaks") for details.

s_right

logical, indicating if the spatial intervals should be closed on the right (and open on the left) or vice versa. Default = FALSE. See vignette("NearRepeat_breaks") for details.

t_include.lowest

t_include.lowest follows the arguments of cut (see ?cut). Logical, indicating if a temporal distance equal to the lowest (or highest, for right = FALSE) 'breaks' value should be included. Default = FALSE.

t_right

logical, indicating if the temporal intervals should be closed on the right (and open on the left) or vice versa. Default = FALSE. See vignette("NearRepeat_breaks") for details.

method

The method to calculate the spatial distances between crime events. Methods possible as in the 'dist' function (see ?dist). Default is 'manhattan', which seems to be a fair approximation of the distance travelled by a road network. Alternatively, the user can specify 'euclidean' to get the 'as the crow flies' distance.

nrep

The number of replications of the Monte Carlo simulation (default = 999).

saveSimulations

Should all simulated contingency tables be saved as a 3-dimensional array? Default = FALSE

future.seed

A logical or an integer (of length one or seven), or a list of length(X) with pre-generated random seeds. Default = TRUE. See R package future.apply for details.

...

(optional) Additional arguments passed to future_lapply()

Details

Further details available at: https://github.com/wsteenbeek/NearRepeat.

Value

An object of type "knox", i.e. a list with four tables. For each spatial and temporal distance combination,(1) The counts of observed crime pairs, (2) The Knox ratios based on the mean of the simulations, (3) The Knox ratios based on the median of the simulations, (4) p-values.

References

Steenbeek W. Near Repeat. R package version 0.1.1. 2018. URL: https://github.com/wsteenbeek/NearRepeat

Examples

## Not run: 
# Generate example data. Suppose x and y refer to meters distance.
set.seed(10)
(mydata <- data.frame(x = sample(x = 20, size = 20, replace = TRUE) * 20,
                     y = sample(x = 20, size = 20, replace = TRUE) * 20,
                     date = as.Date(sort(sample(20, size = 20, replace = TRUE)),
                     origin = "2018-01-01")
                     ))

# Near Repeat calculation using 0-100 meters and 100-Inf meters, and three
# temporal intervals of 2 days
set.seed(38673)
NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
           sds = c(0,100,Inf), tds = c(0,2,4))

# Add a 'same repeat' spatial interval of 0.001 meters, and use Euclidean
# distance
set.seed(38673)
NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
           sds = c(0,0.001,100,Inf), tds = c(0,2,4),
           method = "euclidean")

# Only do 99 replications
set.seed(38673)
NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
           sds = c(0,0.001,100,Inf), tds = c(0,2,4),
           method = "euclidean", nrep = 99)


# The plot() function can be used to plot a Heat Map of Near Repeat results
# based on p-values
set.seed(4622)
myoutput <- NRepeat(x = mydata$x, y = mydata$y, time = mydata$date,
                       sds = c(0,100,200,300,400), td = c(0,1,2,3,4,5))

# The default range of p-values that will be highlighted (0-.05) can be
# adjusted using the 'pvalue_range' parameter. By default the Knox ratios
# are printed in the cells, but this can be adjusted using the 'text'
# parameter. The default is "knox_ratio". Possible values are "observed",
# "knox_ratio", "knox_ratio_median", "pvalues", or NA.

## End(Not run)

[Package stppSim version 1.3.4 Index]