R: Strings of Near Repeats

near_strings1 {ptools}

R Documentation

Strings of Near Repeats

Description

Identifies cases that are nearby each other in space/time

Usage

near_strings1(dat, id, x, y, tim, DistThresh, TimeThresh)

Arguments

`dat`	data frame
`id`	string for id variable in data frame (should be unique)
`x`	string for variable that has the x coordinates
`y`	string for variable that has the y coordinates
`tim`	string for variable that has the time stamp (should be numeric or datetime)
`DistThresh`	scaler for distance threshold (in whatever units x/y are in)
`TimeThresh`	scaler for time threshold (in whatever units tim is in)

Details

This function returns strings of cases nearby in space and time. Useful for near-repeat analysis, or to identify potentially duplicate cases. This particular function is memory safe, although uses loops and will be approximately O(n^2) time (or more specifically choose(n,2)). Tests I have done on my machine 5k rows take only ~10 seconds, but ~100k rows takes around 12 minutes with this code.

Value

A data frame that contains the ids as row.names, and two columns:

CompId, a unique identifier that lets you collapse original cases together
CompNum, the number of linked cases inside of a component

References

Wheeler, A. P., Riddell, J. R., & Haberman, C. P. (2021). Breaking the chain: How arrests reduce the probability of near repeat crimes. Criminal Justice Review, 46(2), 236-258.

Examples

# Simplified example showing two clusters
s <- c(0,0,0,4,4)
ccheck <- c(1,1,1,2,2)
dat <- data.frame(x=1:5,y=0,
                  ti=s,
                  id=1:5)
res1 <- near_strings1(dat,'id','x','y','ti',2,1)
print(res1)

#Full nyc_shoot data with this function takes ~40 seconds
library(sp)
data(nyc_shoot)
nyc_shoot$id <- 1:nrow(nyc_shoot) #incident ID can have dups
mh <- nyc_shoot[nyc_shoot$BORO == 'MANHATTAN',]
print(Sys.time())
res <- near_strings1(mh@data,id='id',x='X_COORD_CD',y='Y_COORD_CD',
                      tim='OCCUR_DATE',DistThresh=1500,TimeThresh=3)
print(Sys.time()) #3k shootings takes only ~1 second on my machine

[Package ptools version 2.0.0 Index]