R: Summarize distributions with ROC curve

summ_roc {pdqr}

R Documentation

Summarize distributions with ROC curve

Description

These functions help you perform a ROC ("Receiver Operating Characteristic") analysis for one-dimensional linear classifier: values not more than some threshold are classified as "negative", and more than threshold - as "positive". Here input pair of pdqr-functions represent "true" distributions of values with "negative" (f) and "positive" (g) labels.

Usage

summ_roc(f, g, n_grid = 1001)

summ_rocauc(f, g, method = "expected")

roc_plot(roc, ..., add_bisector = TRUE)

roc_lines(roc, ...)

Arguments

`f`	A pdqr-function of any type and class. Represents "true" distribution of "negative" values.
`g`	A pdqr-function of any type and class. Represents "true" distribution of "positive" values.
`n_grid`	Number of points of ROC curve to be computed.
`method`	Method of computing ROC AUC. Should be one of "expected", "pessimistic", "optimistic" (see Details).
`roc`	A data frame representing ROC curve. Typically an output of `summ_roc()`.
`...`	Other arguments to be passed to `plot()` or `lines()`.
`add_bisector`	If `TRUE` (default), `roc_plot()` adds bisector line as reference for "random guess" classifier.

Details

ROC curve describes how well classifier performs under different thresholds. For all possible thresholds two classification metrics are computed which later form x and y coordinates of a curve:

False positive rate (FPR): proportion of "negative" distribution which was (incorrectly) classified as "positive". This is the same as one minus "specificity" (proportion of "negative" values classified as "negative").
True positive rate (TPR): proportion of "positive" distribution which was (correctly) classified as "positive". This is also called "sensitivity".

summ_roc() creates a uniform grid of decreasing n_grid values (so that output points of ROC curve are ordered from left to right) covering range of all meaningful thresholds. This range is computed as slightly extended range of f and g supports (extension is needed to achieve extreme values of "fpr" in presence of "discrete" type). Then FPR and TPR are computed for every threshold.

summ_rocauc() computes a common general (without any particular threshold in mind) diagnostic value of classifier, area under ROC curve ("ROC AUC" or "AUROC"). Numerically it is equal to a probability of random variable with distribution g being strictly greater than f plus possible correction for functions being equal, with multiple ways to account for it. Method "pessimistic" doesn't add correction, "expected" adds half of probability of f and g being equal (which is default), "optimistic" adds full probability. Note that this means that correction might be done only if both input pdqr-functions have "discrete" type. See pdqr methods for "Ops" group generic family for more details on comparing functions.

roc_plot() and roc_lines() perform plotting (with plot()) and adding (with lines()) ROC curves respectively.

Value

summ_roc() returns a data frame with n_grid rows and columns "threshold" (grid of classification thresholds, ordered decreasingly), "fpr", and "tpr" (corresponding false and true positive rates, ordered non-decreasingly by "fpr").

summ_rocauc() returns single number representing area under the ROC curve.

roc_plot() and roc_lines() create plotting side effects.

Examples

d_norm_1 <- as_d(dnorm)
d_norm_2 <- as_d(dnorm, mean = 1)
roc <- summ_roc(d_norm_1, d_norm_2)
head(roc)

# `summ_rocauc()` is equivalent to probability of `g > f`
summ_rocauc(d_norm_1, d_norm_2)
summ_prob_true(d_norm_2 > d_norm_1)

# Plotting
roc_plot(roc)
roc_lines(summ_roc(d_norm_2, d_norm_1), col = "blue")

# For "discrete" functions `summ_rocauc()` can produce different outputs
d_dis_1 <- new_d(1:2, "discrete")
d_dis_2 <- new_d(2:3, "discrete")
summ_rocauc(d_dis_1, d_dis_2)
summ_rocauc(d_dis_1, d_dis_2, method = "pessimistic")
summ_rocauc(d_dis_1, d_dis_2, method = "optimistic")
## These methods correspond to different ways of plotting ROC curves
roc <- summ_roc(d_dis_1, d_dis_2)
## Default line plot for "expected" method
roc_plot(roc, main = "Different type of plotting ROC curve")
## Method "pessimistic"
roc_lines(roc, type = "s", col = "blue")
## Method "optimistic"
roc_lines(roc, type = "S", col = "green")

[Package pdqr version 0.3.1 Index]