summ_classmetric {pdqr}R Documentation

Summarize pair of distributions with classification metric

Description

Compute metric of the following one-dimensional binary classification setup: any x value not more than threshold value is classified as "negative"; if strictly greater - "positive". Classification metrics are computed based on two pdqr-functions: f, which represents the distribution of values which should be classified as "negative" ("true negative"), and g - the same for "positive" ("true positive").

Usage

summ_classmetric(f, g, threshold, method = "F1")

summ_classmetric_df(f, g, threshold, method = "F1")

Arguments

f

A pdqr-function of any type and class. Represents distribution of "true negative" values.

g

A pdqr-function of any type and class. Represents distribution of "true positive" values.

threshold

A numeric vector of classification threshold(s).

method

Method of classification metric (might be a vector for summ_classmetric_df()). Should be one of "TPR", "TNR", "FPR", "FNR", "PPV", "NPV", "FDR", "FOR", "LR+", "LR-", "Acc", "ER", "GM", "F1", "OP", "MCC", "YI", "MK", "Jaccard", "DOR" (with possible aliases, see Details).

Details

Binary classification setup used here to compute metrics is a simplified version of the most common one, when there is a finite set of already classified objects. Usually, there are N objects which are truly "negative" and P truly "positive" ones. Values N and P can vary, which often results in class imbalance. However, in current setup both N and P are equal to 1 (total probability of f and g).

In common setup, classification of all N + P objects results into the following values: "TP" (number of truly "positive" values classified as "positive"), "TN" (number of negatives classified as "negative"), "FP" (number of negatives falsely classified as "positive"), and "FN" (number of positives falsely classified as "negative"). In current setup all those values are equal to respective "rates" (because N and P are both equal to 1).

Both summ_classmetric() and summ_classmetric_df() allow aliases to some classification metrics (for readability purposes).

Following classification metrics are available:

Value

summ_classmetric() returns a numeric vector, of the same length as threshold, representing classification metrics for different threshold values.

summ_classmetric_df() returns a data frame with rows corresponding to threshold values. First column is "threshold" (with threshold values), and all other represent classification metric for every input method (see Examples).

See Also

summ_separation for computing optimal separation threshold (which is symmetrical with respect to f and g).

Other summary functions: summ_center(), summ_distance(), summ_entropy(), summ_hdr(), summ_interval(), summ_moment(), summ_order(), summ_prob_true(), summ_pval(), summ_quantile(), summ_roc(), summ_separation(), summ_spread()

Examples

d_unif <- as_d(dunif)
d_norm <- as_d(dnorm)
t_vec <- c(0, 0.5, 0.75, 1.5)

summ_classmetric(d_unif, d_norm, threshold = t_vec, method = "F1")
summ_classmetric(d_unif, d_norm, threshold = t_vec, method = "Acc")

summ_classmetric_df(
  d_unif, d_norm, threshold = t_vec, method = c("F1", "Acc")
)

# Using method aliases
summ_classmetric_df(
  d_unif, d_norm, threshold = t_vec, method = c("TPR", "sensitivity")
)

[Package pdqr version 0.3.1 Index]