calcGD53 {geocmeans} | R Documentation |
Generalized Dunn’s index (53)
Description
Calculate the Generalized Dunn’s index (v53) of clustering quality.
Usage
calcGD53(data, belongmatrix, centers)
Arguments
data |
The original dataframe used for the clustering (n*p) |
belongmatrix |
A membership matrix (n*k) |
centers |
The centres of the clusters |
Details
The Generalized Dunn’s index (Da Silva et al. 2020) is a ratio of the worst pair-wise separation of clusters and the worst compactness of clusters. A higher value indicates a better clustering. The formula is:
GD_{r s}=\frac{\min_{i \neq j}\left[\delta_{r}\left(\omega_{i}, \omega_{j}\right)\right]}{\max_{k}\left[\Delta_{s}\left(\omega_{k}\right)\right]}
The numerator is a measure of the minimal separation between all the clusters i and j given by the formula:
\delta_{r}\left(\omega_{i}, \omega_{j}\right)=\frac{\sum_{l=1}^{n}\left\|\boldsymbol{x_{l}}-\boldsymbol{c_{i}}\right\|^{\frac{1}{2}} . u_{il}+\sum_{l=1}^{n}\left\|\boldsymbol{x_{l}}-\boldsymbol{c_{j}}\right\|^{\frac{1}{2}} . u_{jl}}{\sum{u_{i}} + \sum{u_{j}}}
where u is the membership matrix and u_{i}
is the column of
u describing the membership of the n observations to cluster
i. c_{i}
is the center of the cluster i.
The denominator is a measure of the maximal dispersion of all clusters, given by the formula:
\frac{2*\sum_{l=1}^{n}\left\|\boldsymbol{x}_{l}-\boldsymbol{c_{i}}\right\|^{\frac{1}{2}}}{\sum{u_{i}}}
Value
A float: the Generalized Dunn’s index (53)
References
Da Silva LEB, Melton NM, Wunsch DC (2020). “Incremental cluster validity indices for online learning of hard partitions: Extensions and comparative study.” IEEE Access, 8, 22025–22047.
Examples
data(LyonIris)
AnalysisFields <-c("Lden","NO2","PM25","VegHautPrt","Pct0_14","Pct_65","Pct_Img",
"TxChom1564","Pct_brevet","NivVieMed")
dataset <- sf::st_drop_geometry(LyonIris[AnalysisFields])
queen <- spdep::poly2nb(LyonIris,queen=TRUE)
Wqueen <- spdep::nb2listw(queen,style="W")
result <- SFCMeans(dataset, Wqueen,k = 5, m = 1.5, alpha = 1.5, standardize = TRUE)
calcGD53(result$Data, result$Belongings, result$Centers)