summ_classmetric {pdqr} | R Documentation |
Summarize pair of distributions with classification metric
Description
Compute metric of the following one-dimensional binary classification setup:
any x
value not more than threshold
value is classified as "negative"; if
strictly greater - "positive". Classification metrics are computed based on
two pdqr-functions: f
, which represents the distribution of values which
should be classified as "negative" ("true negative"), and g
- the same
for "positive" ("true positive").
Usage
summ_classmetric(f, g, threshold, method = "F1")
summ_classmetric_df(f, g, threshold, method = "F1")
Arguments
f |
A pdqr-function of any type and class. Represents distribution of "true negative" values. |
g |
A pdqr-function of any type and class. Represents distribution of "true positive" values. |
threshold |
A numeric vector of classification threshold(s). |
method |
Method of classification metric (might be a vector for
|
Details
Binary classification setup used here to compute metrics is a
simplified version of the most common one, when there is a finite set of
already classified objects. Usually, there are N
objects which are truly
"negative" and P
truly "positive" ones. Values N
and P
can vary, which
often results in class imbalance. However, in current setup both N
and
P
are equal to 1 (total probability of f
and g
).
In common setup, classification of all N + P
objects results into the
following values: "TP" (number of truly "positive" values classified as
"positive"), "TN" (number of negatives classified as "negative"), "FP"
(number of negatives falsely classified as "positive"), and "FN" (number of
positives falsely classified as "negative"). In current setup all those
values are equal to respective "rates" (because N
and P
are both equal to
1).
Both summ_classmetric()
and summ_classmetric_df()
allow aliases to some
classification metrics (for readability purposes).
Following classification metrics are available:
Simple metrics:
-
True positive rate,
method
"TPR" (aliases: "TP", "sensitivity", "recall"): proportion of actual positives correctly classified as such. Computed as1 - as_p(g)(threshold)
. -
True negative rate,
method
"TNR" (aliases: "TN", "specificity"): proportion of actual negatives correctly classified as such. Computed asas_p(f)(threshold)
. -
False positive rate,
method
"FPR" (aliases: "FP", "fall-out"): proportion of actual negatives falsely classified as "positive". Computed as1 - as_p(f)(threshold)
. -
False negative rate,
method
"FNR" (aliases: "FN", "miss_rate"): proportion of actual positives falsely classified as "negative". Computed asas_p(g)(threshold)
. -
Positive predictive value,
method
"PPV" (alias: "precision"): proportion of output positives that are actually "positive". Computed asTP / (TP + FP)
. -
Negative predictive value,
method
"NPV": proportion of output negatives that are actually "negative". Computed asTN / (TN + FN)
. -
False discovery rate,
method
"FDR": proportion of output positives that are actually "negative". Computed asFP / (TP + FP)
. -
False omission rate,
method
"FOR": proportion of output negatives that are actually "positive". Computed asFN / (TN + FN)
. -
Positive likelihood,
method
"LR+": measures how much the odds of being "positive" increase when value is classified as "positive". Computed asTPR / (1 - TNR)
. -
Negative likelihood,
method
"LR-": measures how much the odds of being "positive" decrease when value is classified as "negative". Computed as(1 - TPR) / TNR
.
-
Combined metrics (for all, except "error rate", bigger value represents better classification performance):
-
Accuracy,
method
"Acc" (alias: "accuracy"): proportion of total number of input values that were correctly classified. Computed as(TP + TN) / 2
(here 2 is used because of special classification setup,TP + TN + FP + FN = 2
). -
Error rate,
method
"ER" (alias: "error_rate"): proportion of total number of input values that were incorrectly classified. Computed as(FP + FN) / 2
. -
Geometric mean,
method
"GM": geometric mean of TPR and TNR. Computed assqrt(TPR * TNR)
. -
F1 score,
method
"F1": harmonic mean of PPV and TPR. Computed as2*TP / (2*TP + FP + FN)
. -
Optimized precision,
method
"OP": accuracy, penalized for imbalanced class performance. Computed asAcc - abs(TPR - TNR) / (TPR + TNR)
. -
Matthews correlation coefficient,
method
"MCC" (alias: "corr"): correlation between the observed and predicted classifications. Computed as(TP*TN - FP*FN) / sqrt((TP+FP) * (TN+FN))
(here equalitiesTP+FN = 1
andTN+FP = 1
are used to simplify formula). -
Youden’s index,
method
"YI" (aliases: "youden", "informedness"): evaluates the discriminative power of the classification setup. Computed asTPR + TNR - 1
. -
Markedness,
method
"MK" (alias: "markedness"): evaluates the predictive power of the classification setup. Computed asPPV + NPV - 1
. -
Jaccard,
method
"Jaccard": accuracy ignoring correct classification of negatives. Computed asTP / (TP + FP + FN)
. -
Diagnostic odds ratio,
method
"DOR" (alias: "odds_ratio"): ratio between positive and negative likelihoods. Computed as"LR+" / "LR-"
.
-
Value
summ_classmetric()
returns a numeric vector, of the same length as
threshold
, representing classification metrics for different threshold
values.
summ_classmetric_df()
returns a data frame with rows corresponding to
threshold
values. First column is "threshold" (with threshold
values),
and all other represent classification metric for every input method (see
Examples).
See Also
summ_separation for computing optimal separation threshold (which
is symmetrical with respect to f
and g
).
Other summary functions:
summ_center()
,
summ_distance()
,
summ_entropy()
,
summ_hdr()
,
summ_interval()
,
summ_moment()
,
summ_order()
,
summ_prob_true()
,
summ_pval()
,
summ_quantile()
,
summ_roc()
,
summ_separation()
,
summ_spread()
Examples
d_unif <- as_d(dunif)
d_norm <- as_d(dnorm)
t_vec <- c(0, 0.5, 0.75, 1.5)
summ_classmetric(d_unif, d_norm, threshold = t_vec, method = "F1")
summ_classmetric(d_unif, d_norm, threshold = t_vec, method = "Acc")
summ_classmetric_df(
d_unif, d_norm, threshold = t_vec, method = c("F1", "Acc")
)
# Using method aliases
summ_classmetric_df(
d_unif, d_norm, threshold = t_vec, method = c("TPR", "sensitivity")
)