ClusterMCC {FCPS} | R Documentation |
Matthews Correlation Coefficient (MCC)
Description
Matthews correlation coefficient eneralized to the multiclass case (a.k.a. R_K statistic).
Usage
ClusterMCC(PriorCls, CurrentCls,Force=TRUE)
Arguments
PriorCls |
Ground truth,[1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the labels of the clustering. |
CurrentCls |
Main output of the clustering, [1:n] numerical vector with n numbers defining the classification. It has k unique numbers representing the labels of the clustering. |
Force |
Boolean, if is TRUE: forces code even if one or more than one of the k numbers given in |
Details
Contrary to accuracy, the MCC is balanced measure which can be used even if the classes are of very different sizes. When there are more than two labels the MCC will no longer range between -1 and +1. Instead the minimum value will be between -1 and 0 depending on the true distribution. The maximum value is always +1.
Beware that in contrast to ClusterAccuracy
, the labels cannot be arbitrary. Instead each label of PriorCls
and CurrentCls
has to be mapped to the same cluster of data points. Typically this has to be ensured manually.
Value
Single scalar of MCC in a range described in details.
Note
If No. of Clusters is not equivalent, internally the number is allgined with zero datapoints belonging to the missing clusters.
Author(s)
Michael Thrun
References
Matthews, B. W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochimica et Biophysica Acta (BBA), Protein Structure, Vol. 405(2), pp. 442-451, 1975.
Boughorbel, S.B: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric, PLOS ONE, Vol. 12(6), pp. e0177678, 2017.
Chicco, D.; Toetsch, N. and Jurman, G.: The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two_class confusion matrix evaluation. BioData Mining. Vol. 14., 2021.
See Also
Examples
#Beware that algorithm arbitrary defines the labels
data(Hepta)
V=kmeansClustering(Hepta$Data,Type = "Hartigan",7)
table(V$Cls,Hepta$Cls)
#result is only valid if the above issue is resolved manually
ClusterMCC(Hepta$Cls,V$Cls)