Wvalid {UniversalCVI} | R Documentation |
Wiroonsri(2024) correlation-based cluster validity indices
Description
Computes the NC correlation, NCI, NCI1 and NCI2 cluster validity indices for the number of clusters from user specified kmin
to kmax
obtained from either K-means or hierarchical clustering based on the recent paper by Wiroonsri(2024).
Usage
Wvalid(x, kmax, kmin = 2, method = "kmeans",
corr = "pearson", nstart = 100, sampling = 1, NCstart = TRUE)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
kmin |
a minimum number of clusters to be considered. The default is |
method |
a character string indicating which clustering method to be used ( |
corr |
a character string indicating which correlation coefficient is to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
sampling |
a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is |
NCstart |
logical for |
Details
The NC correlation computes the correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. NCI1 and NCI2 are the proportion and the subtraction, respectively, of the same two ratios. The first ratio is the NC improvement from k-1
clusters to k
clusters over the entire room for improvement. The second ratio is the NC improvement from k
clusters to k+1
clusters over the entire room for improvement. NCI is a combination of NCI1 and NCI2.
Value
NC |
the NC correlations for |
Each of the followings shows the values of each index for k
from kmin
to kmax
in a data frame.
NCI |
the NCI index. |
NCI1 |
the NCI1 index. |
NCI2 |
the NCI2 index. |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
N. Wiroonsri, "Clustering performance analysis using a new correlation based cluster validity index," Pattern Recognition, 145, 109910, 2024. doi:10.1016/j.patcog.2023.109910
See Also
Hvalid, FzzyCVIs, DB.IDX, R1_data
Examples
library(UniversalCVI)
# The data is from Wiroonsri (2024).
x = R1_data[,1:2]
# ---- Kmeans ----
# Compute all the indices by Wvalid
K.NC = Wvalid(scale(x), kmax = 15, kmin=2, method = 'kmeans',
corr='pearson', nstart=100, NCstart = TRUE)
print(K.NC)
# The optimal number of cluster
K.NC$NCI[which.max(K.NC$NCI$NCI),]
# ---- Hierarchical ----
# Average linkage
# Compute all the indices by Wvalid
H.NC = Wvalid(scale(x), kmax = 15, kmin=2, method = 'hclust_average',
corr='pearson', nstart=100, NCstart = TRUE)
print(H.NC)
# The optimal number of cluster
H.NC$NCI[which.max(H.NC$NCI$NCI),]