R: Predictive Accuracy Index

pai {ptools}

R Documentation

Predictive Accuracy Index

Description

Given a set of predictions and observed counts, returns the PAI (predictive accuracy index), PEI (predictive efficiency index), and the RRI (recovery rate index)

Usage

pai(dat, count, pred, area, other = c())

Arguments

`dat`	data frame with the predictions, observed counts, and area sizes (can be a vector of ones)
`count`	character specifying the column name for the observed counts (e.g. the out of sample crime counts)
`pred`	character specifying the column name for the predicted counts (e.g. predictions based on a model)
`area`	character specifying the column name for the area sizes (could also be street segment distances, see Drawve & Wooditch, 2019)
`other`	vector of strings for any other column name you want to keep (e.g. an ID variable), defaults to empty `c()`

Details

Given predictions over an entire sample, this returns a dataframe with the sorted best PAI (sorted by density of predicted counts per area). PAI is defined as:

PAI = \frac{c_t/C}{a_t/A}

Where the numerator is the percent of crimes in cumulative t areas, and the denominator is the percent of the area encompassed. PEI is the observed PAI divided by the best possible PAI if you were a perfect oracle, so is scaled between 0 and 1. RRI is predicted/observed, so if you have very bad predictions can return Inf or undefined! See Wheeler & Steenbeek (2019) for the definitions of the different metrics. User note, PEI may behave funny with different sized areas.

Value

A dataframe with the columns:

Order, The order of the resulting rankings
Count, the counts for the original crimes you specified
Pred, the original predictions
Area, the area for the units of analysis
⁠Cum*⁠, the cumulative totals for Count/Pred/Area
⁠PCum*⁠, the proportion cumulative totals, e.g. CumCount/sum(Count)
PAI, the PAI stat
PEI, the PEI stat
RRI, the RRI stat (probably should analyze/graph the log(RRI))!

Plus any additional variables specified by other at the end of the dataframe.

References

Drawve, G., & Wooditch, A. (2019). A research note on the methodological and theoretical considerations for assessing crime forecasting accuracy with the predictive accuracy index. Journal of Criminal Justice, 64, 101625.

Wheeler, A. P., & Steenbeek, W. (2021). Mapping the risk terrain for crime using machine learning. Journal of Quantitative Criminology, 37(2), 445-480.

Examples


# Making some very simple fake data
crime_dat <- data.frame(id=1:6,
                        obs=c(6,7,3,2,1,0),
                        pred=c(8,4,4,2,1,0))
crime_dat$const <- 1
p1 <- pai(crime_dat,'obs','pred','const')
print(p1)

# Combining multiple predictions, making
# A nice table
crime_dat$rand <- sample(crime_dat$obs,nrow(crime_dat),FALSE)
p2 <- pai(crime_dat,'obs','rand','const')
pai_summary(list(p1,p2),c(1,3,5),c('one','two'))

[Package ptools version 2.0.0 Index]