R: Construct and analyze confusion matrices

confusion {mlearning}

R Documentation

Construct and analyze confusion matrices

Description

Confusion matrices compare two classifications (usually one done automatically using a machine learning algorithm versus the true classification done by a specialist... but one can also compare two automatic or two manual classifications against each other).

Usage

confusion(x, ...)

## Default S3 method:
confusion(
  x,
  y = NULL,
  vars = c("Actual", "Predicted"),
  labels = vars,
  merge.by = "Id",
  useNA = "ifany",
  prior,
  ...
)

## S3 method for class 'mlearning'
confusion(
  x,
  y = response(x),
  labels = c("Actual", "Predicted"),
  useNA = "ifany",
  prior,
  ...
)

## S3 method for class 'confusion'
print(x, sums = TRUE, error.col = sums, digits = 0, sort = "ward.D2", ...)

## S3 method for class 'confusion'
summary(object, type = "all", sort.by = "Fscore", decreasing = TRUE, ...)

## S3 method for class 'summary.confusion'
print(x, ...)

Arguments

`x`	an object with a `confusion()` method implemented.
`...`	further arguments passed to the method.
`y`	another object, from which to extract the second classification, or `NULL` if not used.
`vars`	the variables of interest in the first and second classification in the case the objects are lists or data frames. Otherwise, this argument is ignored and `x` and `y` must be factors with same length and same levels.
`labels`	labels to use for the two classifications. By default, they are the same as `vars`, or the one in the confusion matrix.
`merge.by`	a character string with the name of variables to use to merge the two data frames, or `NULL`.
`useNA`	do we keep `NA`s as a separate category? The default `"ifany"` creates this category only if there are missing values. Other possibilities are `"no"`, or `"always"`.
`prior`	class frequencies to use for first classifier that is tabulated in the rows of the confusion matrix. For its value, see here under, the `⁠value=⁠` argument.
`sums`	is the confusion matrix printed with rows and columns sums?
`error.col`	is a column with class error for first classifier added (equivalent to false negative rate of FNR)?
`digits`	the number of digits after the decimal point to print in the confusion matrix. The default or zero leads to most compact presentation and is suitable for frequencies, but not for relative frequencies.
`sort`	are rows and columns of the confusion matrix sorted so that classes with larger confusion are closer together? Sorting is done using a hierarchical clustering with `hclust()`. The clustering method is `"ward.D2"` by default, but see the `hclust()` help for other options). If `FALSE` or `NULL`, no sorting is done.
`object`	a confusion object
`type`	either `"all"` (by default), or considering `TP` is the true positives, `FP` is the false positives, `TN` is the true negatives and `FN` is the false negatives, one can also specify: `"Fscore"` (F-score = F-measure = F1 score = harmonic mean of Precision and recall), `"Recall"` (TP / (TP + FN) = 1 - FNR), `"Precision"` (TP / (TP + FP) = 1 - FDR), `"Specificity"` (TN / (TN + FP) = 1 - FPR), `"NPV"` (Negative predicted value = TN / (TN + FN) = 1 - FOR), `"FPR"` (False positive rate = 1 - Specificity = FP / (FP + TN)), `"FNR"` (False negative rate = 1 - Recall = FN / (TP + FN)), `"FDR"` (False Discovery Rate = 1 - Precision = FP / (TP + FP)), `"FOR"` (False omission rate = 1 - NPV = FN / (FN + TN)), `"LRPT"` (Likelihood Ratio for Positive Tests = Recall / FPR = Recall / (1 - Specificity)), `"LRNT"` Likelihood Ratio for Negative Tests = FNR / Specificity = (1 - Recall) / Specificity, `"LRPS"` (Likelihood Ratio for Positive Subjects = Precision / FOR = Precision / (1 - NPV)), `"LRNS"` (Likelihood Ratio Negative Subjects = FDR / NPV = (1 - Precision) / (1 - FOR)), `"BalAcc"` (Balanced accuracy = (Sensitivity + Specificity) / 2), `"MCC"` (Matthews correlation coefficient), `"Chisq"` (Chisq metric), or `"Bray"` (Bray-Curtis metric)
`sort.by`	the statistics to use to sort the table (by default, Fmeasure, the F1 score for each class = 2 * recall * precision / (recall + precision)).
`decreasing`	do we sort in increasing or decreasing order?

Value

A confusion matrix in a confusion object.

Examples

data("Glass", package = "mlbench")
# Use a little bit more informative labels for Type
Glass$Type <- as.factor(paste("Glass", Glass$Type))

# Use learning vector quantization to classify the glass types
# (using default parameters)
summary(glass_lvq <- ml_lvq(Type ~ ., data = Glass))

# Calculate cross-validated confusion matrix
(glass_conf <- confusion(cvpredict(glass_lvq), Glass$Type))
# Raw confusion matrix: no sort and no margins
print(glass_conf, sums = FALSE, sort = FALSE)

summary(glass_conf)
summary(glass_conf, type = "Fscore")