R: Fuzzy cluster validity indexes used in Wiroonsri and...

FzzyCVIs {UniversalCVI}

R Documentation

Fuzzy cluster validity indexes used in Wiroonsri and Preedasawakul (2023)

Description

Computes the cluster validity indexes for a result of either FCM or EM clustering from user specified cmin to cmax used in Wiroonsri and Preedasawakul (2023). It includes the XB (X. L. Xie and G. Beni, 1991) index, KWON (S. H. Kwon, 1998) index, KWON2 (S. H. Kwon et al., 2021) index, TANG (Y. Tang et al., 2005) index , HF (F. Haouas et al., 2017) index, WL (C. H. Wu et al., 2015) index, PBM (M. K. Pakhira et al., 2004) index, KPBM (C. Alok, 2010) index, CCVP and CCVS (M. Popescu et al., 2013) index, GC1, GC2, GC3, and GC4 (J. C. Bezdek et al., 2016) indexes , WPC, WP, WPCI1, and, WPCI2 (N. Wiroonsri and O. Preedasawakul, 2023) indexes.

Usage

FzzyCVIs(x, cmax, cmin = 2, indexlist = 'all', corr = 'pearson',
  method = 'FCM', fzm = 2, gamma = (fzm^2*7)/4, sampling = 1,
  iter = 100, nstart = 20, NCstart = TRUE)

Arguments

`x`	a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.
`cmax`	a maximum number of clusters to be considered.
`cmin`	a minimum number of clusters to be considered. The default is `2`.
`indexlist`	a character string indicating which cluster validity indexes to be computed (`"all"`, `"WPC"`, `"WP"`, `"WPCI1"`, `"WPCI2"`, `"XB"`, "`"KWON"`", "`"KWON2"`", "`"TANG"`", "`"HF"`", `"WL"`, `"PBM"`, `"KPBM"`, `"CCVP"`, `"CCVS"`, `"GC1"`, `"GC2"`, `"GC3"`, `"GC4"`). More than one indexes can be selected.
`corr`	a character string indicating which correlation coefficient is to be computed (`"pearson"`, `"kendall"` or `"spearman"`) for `indexlist` = (`"WP"`, `"WPC"`, `"WPCI1"`,`"WPCI2"`, `"CCVP"`, `"CCVS"`, `"GC1"`, `"GC2"`, `"GC3"` or `"GC4"`). The default is `"pearson"`.
`method`	a character string indicating which clustering method to be used (`"FCM"` or `"EM"`). The default is `"FCM"`.
`fzm`	a number greater than 1 giving the degree of fuzzification for `method = "FCM"`. The default is `2`.
`gamma`	adjusted fuzziness parameter for `indexlist` = (`"WP"`, `"WPC"`, `"WPCI1"`, `"WPCI2"`). The default is `7fzm^2/4`.
`sampling`	a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is `1`.
`iter`	a maximum number of iterations for `method = "FCM"`. The default is `100`.
`nstart`	a maximum number of initial random sets for FCM for `method = "FCM"`. The default is `20`.
`NCstart`	logical for `indexlist` includes either of the `"WP"`, `"WPC"`, `"WPCI1"`, and `"WPCI2"`), if `TRUE`, the WP correlation at `c=1` is defined as the ratio introduced in the reference. Otherwise, it is assigned as `0`.

Details

The well-known cluster validity indexes for either FCM or EM clustering. It includes the XB (X. L. Xie and G. Beni., 1991) index, KWON (S. H. Kwon, 1998) index, KWON2 (S. H. Kwon et al., 2021) index, TANG (Y. Tang et al., 2005) index , HF (F. Haouas et al., 2017) index, WL (C. H. Wu et al., 2015) index, PBM (M. K. Pakhira et al., 2004) index, KPBM (C. Alok, 2010) index, CCVP and CCVS (M. Popescu et al., 2013) index, GC1, GC2, GC3, and GC4 (J. C. Bezdek et al., 2016) indexes , WPC, WP, WPCI1, and, WPCI2 (N. Wiroonsri and O. Preedasawakul, 2023) indexes.

The WPC computes the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to the pair. WPCI1 and WPCI2 are the proportion and the subtraction, respectively, of the same two ratios. The first ratio is the WPC improvement from c-1 clusters to c clusters over the entire room for improvement. The second ratio is the WPC improvement from c clusters to c+1 clusters over the entire room for improvement. WP is defined as a combination of WPCI1 and WPCI2.

Value

WPC

the WP correlation from c from cmin-1 to cmax+1 shown in a data frame.

Each of the followings shows the values of each index for c from cmin to cmax in a data frame.

`WP`	the WP index.
`WPCI1`	the WPCI1 index.
`WPCI2`	the WPCI2 index.
`XB`	the XB index.
`KWON`	the KWON index.
`KWON2`	the KWON2 index.
`TANG`	the TANG index.
`HF`	the HF index.
`WL`	the WL index.
`PBM`	the PBM index
`KPBM`	the KPBM index
`CCVP`	the Pearson Correlation Cluster Validity index.
`CCVS`	the Spearman’s (rho) Correlation Cluster Validity index.
`GC1`	the generalized C index (`\sum\cdot \sim` Sum-Product).
`GC2`	the generalized C index (`\sum\wedge \sim` Sum-Min).
`GC3`	the generalized C index (`\vee\cdot \sim` Max-Product).
`GC4`	the generalized C index (`\vee\wedge \sim` Max-Min).

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

C. Alok. (2010). "An investigation of clustering algorithms and soft computing approaches for pattern recognition," Department of Computer Science, Assam University.

J. C. Bezdek, M. Moshtaghi, T. Runkler, C. Leckie, “The generalized c index for internal fuzzy cluster validity,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 6, pp. 1500–1512, 2016.

F. Haouas, Z. Ben Dhiaf, A. Hammouda, B. Solaiman, "A new efficient fuzzy cluster validity index: Application to images clustering," 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Naples, Italy, 2017, pp. 1-6.

S. H. Kwon, “Cluster validity index for fuzzy clustering,” Electronics letters, vol. 34, no. 22, pp. 2176–2177, 1998.

S. H. Kwon, J. Kim, S. H. Son, “Improved cluster validity index for fuzzy clustering,” Electronics Letters, vol. 57, no. 21, pp. 792–794, 2021.

M. K. Pakhira, S. Bandyopadhyay, U. Maulik, “Validity index for crisp and fuzzy clusters,” Pattern recognition, vol. 37, no. 3, pp. 487–501, 2004.

M. Popescu, J. C. Bezdek, T. C. Havens, J. M. Keller, "A Cluster Validity Framework Based on Induced Partition Dissimilarity," in IEEE Transactions on Cybernetics, vol. 43, no. 1, pp. 308-320, Feb. 2013.

Y. Tang, F. Sun, Z. Sun, “Improved validation index for fuzzy clustering,” in Proceedings of the 2005, American Control Conference, 2005., pp. 1120–1125 vol. 2, 2005.

N. Wiroonsri, O. Preedasawakul, "A correlation-based fuzzy cluster validity index with secondary options detector," arXiv:2308.14785, 2023

C. H. Wu, C. S. Ouyang, L. W. Chen, L. W. Lu, “A new fuzzy clustering validity index with a median factor for centroid-based clustering,” IEEE Transactions on Fuzzy Systems, vol. 23, no. 3, pp. 701–718, 2015.

X. Xie, G. Beni, “A validity measure for fuzzy clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8, pp. 841–847, 1991.

Examples


library(UniversalCVI)

# Iris data
x = iris[,1:4]

# ---- FCM algorithm ----


# Compute selected a set of indices ("WPC","WP","XB") using default gamma
F.s = FzzyCVIs(scale(x), cmax = 10, cmin = 2, indexlist = c("WPC","WP","XB"),
  corr = 'pearson', method = 'FCM', fzm = 2, iter = 100, nstart = 20, NCstart = TRUE)

# Plot the computed indexes
plot_idx(F.s)

# ---- EM algorithm ----

# Compute all the indices by FzzyCVIs using default gamma
E.all = FzzyCVIs(scale(x), cmax = 10, cmin = 2, indexlist = 'all', corr = 'pearson',
  method = 'EM', iter = 100, nstart = 20, NCstart = TRUE)

# Plot the computed indexes
plot_idx(E.all)

[Package UniversalCVI version 1.1.2 Index]