confusion_matrix {cvms} R Documentation

## Create a confusion matrix

### Description

Creates a confusion matrix from targets and predictions. Calculates associated metrics.

Multiclass results are based on one-vs-all evaluations. Both regular averaging and weighted averaging are available. Also calculates the Overall Accuracy.

Note: In most cases you should use evaluate() instead. It has additional metrics and works in magrittr pipes (e.g. %>%) and with dplyr::group_by(). confusion_matrix() is more lightweight and may be preferred in programming when you don't need the extra stuff in evaluate().

### Usage

confusion_matrix(
targets,
predictions,
metrics = list(),
positive = 2,
c_levels = NULL,
do_one_vs_all = TRUE,
parallel = FALSE
)


### Arguments

 targets vector with true classes. Either numeric or character. predictions vector with predicted classes. Either numeric or character. metrics list for enabling/disabling metrics. E.g. list("Accuracy" = TRUE) would add the regular accuracy metric, whie list("F1" = FALSE) would remove the F1 metric. Default values (TRUE/FALSE) will be used for the remaining available metrics. You can enable/disable all metrics at once by including "all" = TRUE/FALSE in the list. This is done prior to enabling/disabling individual metrics, why for instance list("all" = FALSE, "Accuracy" = TRUE) would return only the Accuracy metric. The list can be created with binomial_metrics() or multinomial_metrics(). Also accepts the string "all". positive Level from targets to predict. Either as character (preferable) or level index (1 or 2 - alphabetically). (Two-class only) E.g. if we have the levels "cat" and "dog" and we want "dog" to be the positive class, we can either provide "dog" or 2, as alphabetically, "dog" comes after "cat". Note: For reproducibility, it's preferable to specify the name directly, as different locales may sort the levels differently. c_levels vector with categorical levels in the targets. Should have same type as targets. If NULL, they are inferred from targets. N.B. the levels are sorted alphabetically. When positive is numeric (i.e. an index), it therefore still refers to the index of the alphabetically sorted levels. do_one_vs_all Whether to perform one-vs-all evaluations when working with more than 2 classes (multiclass). If you are only interested in the confusion matrix, this allows you to skip most of the metric calculations. parallel Whether to perform the one-vs-all evaluations in parallel. (Logical) N.B. This only makes sense when you have a lot of classes or a very large dataset. Remember to register a parallel backend first. E.g. with doParallel::registerDoParallel.

### Details

The following formulas are used for calculating the metrics:

Sensitivity = TP / (TP + FN)

Specificity = TN / (TN + FP)

Pos Pred Value = TP / (TP + FP)

Neg Pred Value = TN / (TN + FN)

Balanced Accuracy = (Sensitivity + Specificity) / 2

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Overall Accuracy = Correct / (Correct + Incorrect)

F1 = 2 * Pos Pred Value * Sensitivity / (Pos Pred Value + Sensitivity)

MCC = ((TP * TN) - (FP * FN)) / sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))

Note for MCC: Formula is for the binary case. When the denominator is 0, we set it to 1 to avoid NaN. See the metrics vignette for the multiclass version.

Detection Rate = TP / (TP + FN + TN + FP)

Detection Prevalence = (TP + FP) / (TP + FN + TN + FP)

Threat Score = TP / (TP + FN + FP)

False Neg Rate = 1 - Sensitivity

False Pos Rate = 1 - Specificity

False Discovery Rate = 1 - Pos Pred Value

False Omission Rate = 1 - Neg Pred Value

For Kappa the counts (TP, TN, FP, FN) are normalized to percentages (summing to 1). Then the following is calculated:

p_observed = TP + TN

p_expected = (TN + FP) * (TN + FN) + (FN + TP) * (FP + TP)

Kappa = (p_observed - p_expected) / (1 - p_expected)

### Value

tibble with:

Nested confusion matrix (tidied version)

Nested confusion matrix (table)

The Positive Class.

Multiclass only: Nested Class Level Results with the two-class metrics, the nested confusion matrices, and the Support metric, which is a count of the class in the target column and is used for the weighted average metrics.

The following metrics are available (see metrics):

#### Two classes or more

 Metric Name Default Balanced Accuracy "Balanced Accuracy" Enabled Accuracy "Accuracy" Disabled F1 "F1" Enabled Sensitivity "Sensitivity" Enabled Specificity "Specificity" Enabled Positive Predictive Value "Pos Pred Value" Enabled Negative Predictive Value "Neg Pred Value" Enabled Kappa "Kappa" Enabled Matthews Correlation Coefficient "MCC" Enabled Detection Rate "Detection Rate" Enabled Detection Prevalence "Detection Prevalence" Enabled Prevalence "Prevalence" Enabled False Negative Rate "False Neg Rate" Disabled False Positive Rate "False Pos Rate" Disabled False Discovery Rate "False Discovery Rate" Disabled False Omission Rate "False Omission Rate" Disabled Threat Score "Threat Score" Disabled

The Name column refers to the name used in the package. This is the name in the output and when enabling/disabling in metrics.

#### Three classes or more

The metrics mentioned above (excluding MCC) has a weighted average version (disabled by default; weighted by the Support).

In order to enable a weighted metric, prefix the metric name with "Weighted " when specifying metrics.

E.g. metrics = list("Weighted Accuracy" = TRUE).

 Metric Name Default Overall Accuracy "Overall Accuracy" Enabled Weighted * "Weighted *" Disabled Multiclass MCC "MCC" Enabled

### Author(s)

Ludvig Renbo Olsen, r-pkgs@ludvigolsen.dk

Other evaluation functions: binomial_metrics(), evaluate_residuals(), evaluate(), gaussian_metrics(), multinomial_metrics()

### Examples


# Attach cvms
library(cvms)

# Two classes

# Create targets and predictions
targets <- c(0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
predictions <- c(1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0)

# Create confusion matrix with default metrics
cm <- confusion_matrix(targets, predictions)
cm
cm[["Confusion Matrix"]]
cm[["Table"]]

# Three classes

# Create targets and predictions
targets <- c(0, 1, 2, 1, 0, 1, 2, 1, 0, 1, 2, 1, 0)
predictions <- c(2, 1, 0, 2, 0, 1, 1, 2, 0, 1, 2, 0, 2)

# Create confusion matrix with default metrics
cm <- confusion_matrix(targets, predictions)
cm
cm[["Confusion Matrix"]]
cm[["Table"]]

# Enabling weighted accuracy

# Create confusion matrix with Weighted Accuracy enabled
cm <- confusion_matrix(targets, predictions,
metrics = list("Weighted Accuracy" = TRUE)
)
cm



[Package cvms version 1.3.3 Index]