get_roc_stats {usefun} | R Documentation |
Generate ROC statistics
Description
Use this function to generate the most useful statistics related to the generation of a basic ROC (Receiver Operating Characteristic) curve.
Usage
get_roc_stats(df, pred_col, label_col, direction = "<")
Arguments
df |
a |
pred_col |
string. The name of the column of the |
label_col |
string. The name of the column of the |
direction |
string. Can be either > or < (default value) and indicates the direction/ranking of the prediction values with respect to the positive class labeling (for a specific threshold). If smaller prediction values indicate the positive class/label use < whereas if larger prediction values indicate the positive class/label (e.g. probability of positive class), use >. |
Value
A list with two elements:
-
roc_stats
: atibble
which includes the thresholds for the ROC curve and the confusion matrix stats for each threshold as follows: TP (#True Positives), FN (#False Negatives), TN (#True Negatives), FP (#False Positives), FPR (False Positive Rate - the x-axis values for the ROC curve) and TPR (True Positive Rate - the y-axis values for the ROC curve). Also included are the dist-from-chance (the vertical distance of the corresponding (FPR,TPR) point to the chance line or positive diagonal) and the dist-from-0-1 (the euclidean distance of the corresponding (FPR,TPR) point from (0,1)). -
AUC
: a number representing the Area Under the (ROC) Curve.
The returned results provide an easy way to compute two optimal cutpoints (thresholds) that dichotomize the predictions to positive and negative. The first is the Youden index, which is the maximum vertical distance from the ROC curve to the chance line or positive diagonal. The second is the point of the ROC curve closest to the (0,1) - the point of perfect differentiation. See examples below.
Examples
# load libraries
library(readr)
library(dplyr)
# load test tibble
test_file = system.file("extdata", "test_df.tsv", package = "usefun", mustWork = TRUE)
test_df = readr::read_tsv(test_file, col_types = "di")
# get ROC stats
res = get_roc_stats(df = test_df, pred_col = "score", label_col = "observed")
# Plot ROC with a legend showing the AUC value
plot(x = res$roc_stats$FPR, y = res$roc_stats$TPR,
type = 'l', lwd = 2, col = '#377EB8', main = 'ROC curve',
xlab = 'False Positive Rate (FPR)', ylab = 'True Positive Rate (TPR)')
legend('bottomright', legend = round(res$AUC, digits = 3),
title = 'AUC', col = '#377EB8', pch = 19)
grid()
abline(a = 0, b = 1, col = '#FF726F', lty = 2)
# Get two possible cutoffs
youden_index_df = res$roc_stats %>%
filter(dist_from_chance == max(dist_from_chance))
min_classification_df = res$roc_stats %>%
filter(dist_from_0_1 == min(dist_from_0_1))