pai_summary {ptools} | R Documentation |
Summary Table for Multiple PAI Stats
Description
Takes a list of multiple PAI summary tables (for different predictions) and returns summaries at fixed area thresholds
Usage
pai_summary(pai_list, thresh, labs, wide = TRUE)
Arguments
pai_list |
list of data frames that have the PAI stats from the |
thresh |
vector of area numbers to select, e.g. 10 would select the top 10 areas, c(10,100) would select the top 10 and the top 100 areas |
labs |
vector of characters that specifies the labels for each PAI dataframe, should be the same length as |
wide |
boolean, if TRUE (default), returns data frame in wide format. Else returns summaries in long format |
Details
Given predictions over an entire sample, this returns a dataframe with the sorted best PAI (sorted by density of predicted counts per area). PAI is defined as:
PAI = \frac{c_t/C}{a_t/A}
Where the numerator is the percent of crimes in cumulative t areas, and the denominator is the percent of the area encompassed.
PEI is the observed PAI divided by the best possible PAI if you were a perfect oracle, so is scaled between 0 and 1.
RRI is predicted/observed
, so if you have very bad predictions can return Inf or undefined!
See Wheeler & Steenbeek (2019) for the definitions of the different metrics.
User note, PEI may behave funny with different sized areas.
Value
A dataframe with the PAI/PEI/RRI, and cumulative crime/predicted counts, for each original table
References
Drawve, G., & Wooditch, A. (2019). A research note on the methodological and theoretical considerations for assessing crime forecasting accuracy with the predictive accuracy index. Journal of Criminal Justice, 64, 101625.
Wheeler, A. P., & Steenbeek, W. (2021). Mapping the risk terrain for crime using machine learning. Journal of Quantitative Criminology, 37(2), 445-480.
See Also
pai()
for a summary table of metrics for multiple pai tables given fixed N thresholds
Examples
# Making some very simple fake data
crime_dat <- data.frame(id=1:6,
obs=c(6,7,3,2,1,0),
pred=c(8,4,4,2,1,0))
crime_dat$const <- 1
p1 <- pai(crime_dat,'obs','pred','const')
print(p1)
# Combining multiple predictions, making
# A nice table
crime_dat$rand <- sample(crime_dat$obs,nrow(crime_dat),FALSE)
p2 <- pai(crime_dat,'obs','rand','const')
pai_summary(list(p1,p2),c(1,3,5),c('one','two'))