perfMeasures {MKclass} | R Documentation |
Compute Performance Measures or Binary Classification
Description
The function computes various performance measures for binary classification.
Usage
perfMeasures(pred, pred.group, truth, namePos, cutoff = 0.5,
weight = 0.5, wACC = weight, wLR = weight,
wPV = weight, beta = 1, measures = "all")
Arguments
pred |
numeric values that shall be used for classification; e.g. probabilities to belong to the positive group. |
pred.group |
vector or factor including the predicted group. If missing,
|
truth |
true grouping vector or factor. |
namePos |
value representing the positive group. |
cutoff |
cutoff value used for classification. |
weight |
weight used for computing weighted values. Must be in [0,1]. |
wACC |
weight used for computing the weighted accuracy, where sensitivity
is multiplied by |
wLR |
weight used for computing the weighted likelihood ratio, where PLR
is multiplied by |
wPV |
weight used for computing the weighted predictive value, where PPV
is multiplied by |
beta |
beta coefficient used for computing the F beta score. Must be nonnegative. |
measures |
character vector giving the measures that shall be computed;
see details. Default |
Details
The function perfMeasures
can be used to compute various performance
measures. For computing specific measures, the abbreviation given in
parentheses have to be specified in argument measures
. Single measures
can also be computed by respective functions, where their names are identical
to the abbreviations given in the parentheses.
The measures are: accuracy (ACC), probability of correct classification (PCC), fraction correct (FC), simple matching coefficient (SMC), Rand (similarity) index (RSI), probability of misclassification (PMC), error rate (ER), fraction incorrect (FIC), sensitivity (SENS), recall (REC), true positive rate (TPR), probability of detection (PD), hit rate (HR), specificity (SPEC), true negative rate (TNR), selectivity (SEL), detection rate (DR), false positive rate (FPR), fall-out (FO), false alarm (rate) (FAR), probability of false alarm (PFA), false negative rate (FNR), miss rate (MR), false discovery rate (FDR), false omission rate (FOR), prevalence (PREV), (positive) pre-test probability (PREP), (positive) pre-test odds (PREO), detection prevalence (DPREV), negative pre-test probability (NPREP), negative pre-test odds (NPREO), no information rate (NIR), weighted accuracy (WACC), balanced accuracy (BACC), (bookmaker) informedness (INF), Youden's J statistic (YJS), deltap' (DPp), positive likelihood ratio (PLR), negative likelihood ratio (NLR), weighted likelihood ratio (WLR), balanced likelihood ratio (BLR), diagnostic odds ratio (DOR), positive predictive value (PPV), precision (PREC), (positive) post-test probability (POSTP), (positive) post-test odds (POSTO), Bayes factor G1 (BFG1), negative predictive value (NPV), negative post-test probability (NPOSTP), negative post-test odds (NPOSTO), Bayes factor G0 (BFG0), markedness (MARK), deltap (DP), weighted predictive value (WPV), balanced predictive value (BPV), F1 score (F1S), Dice similarity coefficient (DSC), F beta score (FBS), Jaccard similarity coefficient (JSC), threat score (TS), critical success index (CSI), Matthews' correlation coefficient (MCC), Pearson's correlation (r phi) (RPHI), Phi coefficient (PHIC), Cramer's V (CRV), proportion of positive predictions (PPP), expected accuracy (EACC), Cohen's kappa coefficient (CKC), mutual information in bits (MI2), joint entropy in bits (JE2), variation of information in bits (VI2), Jaccard distance (JD), information quality ratio (INFQR), uncertainty coefficient (UC), entropy coefficient (EC), proficiency (metric) (PROF), deficiency (metric) (DFM), redundancy (RED), symmetric uncertainty (SU), normalized uncertainty (NU)
These performance measures have in common that they require a dichotomization
of the computed predictions (classification function). For measuring the performance
without dichotomization one can apply function perfScores
.
The prevalence is the prevalence given by the data. This often is not identical
to the prevalence of the population. Hence, it might be better to compute
PPV and NPV (and derived measures) by applying function predValues
,
where one can specify the assumed prevalence. This holds in general for all
measures that depend on the prevalence.
Value
data.frame
with names of the performance measures and their
respective values.
Author(s)
Matthias Kohl Matthias.Kohl@stamats.de
References
K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010). The balanced accuracy and its posterior distribution. In Pattern Recognition (ICPR), 20th International Conference on, 3121-3124 (IEEE, 2010).
J.A. Cohen (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 3746.
T. Fawcett (2006). An introduction to ROC analysis. Pattern Recognition Letters 27, 861-874.
T.A. Gerds, T. Cai, M. Schumacher (2008). The performance of risk prediction models. Biom J 50, 457-479.
D. Hand, R. Till (2001). A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171-186.
J. Hernandez-Orallo, P.A. Flach, C. Ferri (2012). A unified view of performance metrics: Translating threshold choice into expected classification loss. J. Mach. Learn. Res. 13, 2813-2869.
B.W. Matthews (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure 405, 442-451.
D.M. Powers (2011). Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1, 37-63.
N.A. Smits (2010). A note on Youden's J and its cost ratio. BMC Medical Research Methodology 10, 89.
B. Wallace, I. Dahabreh (2012). Class probability estimates are unreliable for imbalanced data (and how to fix them). In Data Mining (ICDM), IEEE 12th International Conference on, 695-04.
J.W. Youden (1950). Index for rating diagnostic tests. Cancer 3, 32-35.
See Also
confMatrix
, predValues
, perfScores
Examples
## example from dataset infert
fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
pred <- predict(fit, type = "response")
## with group numbers
perfMeasures(pred, truth = infert$case, namePos = 1)
## with group names
my.case <- factor(infert$case, labels = c("control", "case"))
perfMeasures(pred, truth = my.case, namePos = "case")
## on the scale of the linear predictors
pred2 <- predict(fit)
perfMeasures(pred2, truth = infert$case, namePos = 1, cutoff = 0)
## using weights
perfMeasures(pred, truth = infert$case, namePos = 1, weight = 0.3)
## selecting a subset of measures
perfMeasures(pred, truth = infert$case, namePos = 1,
measures = c("SENS", "SPEC", "BACC", "YJS"))