cluster_validity {genieclust} | R Documentation |
Internal Cluster Validity Measures
Description
Implementation of a number of so-called cluster validity indices critically reviewed in (Gagolewski, Bartoszuk, Cena, 2021). See Section 2 therein and (Gagolewski, 2022) for the respective definitions.
The greater the index value, the more valid (whatever that means) the assessed partition. For consistency, the Ball-Hall and Davies-Bouldin indexes as well as the within-cluster sum of squares (WCSS) take negative values.
Usage
calinski_harabasz_index(X, y)
dunnowa_index(
X,
y,
M = 25L,
owa_numerator = "SMin:5",
owa_denominator = "Const"
)
generalised_dunn_index(X, y, lowercase_d, uppercase_d)
negated_ball_hall_index(X, y)
negated_davies_bouldin_index(X, y)
negated_wcss_index(X, y)
silhouette_index(X, y)
silhouette_w_index(X, y)
wcnn_index(X, y, M = 25L)
Arguments
X |
numeric matrix with |
y |
vector of |
M |
number of nearest neighbours |
owa_numerator , owa_denominator |
single string specifying
the OWA operators to use in the definition of the DuNN index;
one of: |
lowercase_d |
an integer between 1 and 5, denoting
|
uppercase_d |
an integer between 1 and 3, denoting
|
Value
A single numeric value (the more, the better).
Author(s)
Marek Gagolewski and other contributors
References
Ball G.H., Hall D.J., ISODATA: A novel method of data analysis and pattern classification, Technical report No. AD699616, Stanford Research Institute, 1965.
Bezdek J., Pal N., Some new indexes of cluster validity, IEEE Transactions on Systems, Man, and Cybernetics, Part B 28, 1998, 301-315, doi:10.1109/3477.678624.
Calinski T., Harabasz J., A dendrite method for cluster analysis, Communications in Statistics 3(1), 1974, 1-27, doi:10.1080/03610927408827101.
Davies D.L., Bouldin D.W., A Cluster Separation Measure, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1 (2), 1979, 224-227, doi:10.1109/TPAMI.1979.4766909.
Dunn J.C., A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters, Journal of Cybernetics 3(3), 1973, 32-57, doi:10.1080/01969727308546046.
Gagolewski M., Bartoszuk M., Cena A., Are cluster validity measures (in)valid?, Information Sciences 581, 620-636, 2021, doi:10.1016/j.ins.2021.10.004; preprint: https://raw.githubusercontent.com/gagolews/bibliography/master/preprints/2021cvi.pdf.
Gagolewski M., A Framework for Benchmarking Clustering Algorithms, 2022, https://clustering-benchmarks.gagolewski.com.
Rousseeuw P.J., Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis, Computational and Applied Mathematics 20, 1987, 53-65, doi:10.1016/0377-0427(87)90125-7.
See Also
The official online manual of genieclust at https://genieclust.gagolewski.com/
Gagolewski M., genieclust: Fast and robust hierarchical clustering, SoftwareX 15:100722, 2021, doi:10.1016/j.softx.2021.100722.
Examples
X <- as.matrix(iris[,1:4])
X[,] <- jitter(X) # otherwise we get a non-unique solution
y <- as.integer(iris[[5]])
calinski_harabasz_index(X, y) # good
calinski_harabasz_index(X, sample(1:3, nrow(X), replace=TRUE)) # bad