R: Wiroonsri and Preedasawakul (WP) index

WP.IDX {UniversalCVI}

R Documentation

Wiroonsri and Preedasawakul (WP) index

Description

Computes the WPC (WP correlation), WP, WPCI1 and WPCI2 (N. Wiroonsri and O. Preedasawakul, 2023) indexes for a result of either FCM or EM clustering from user specified cmin to cmax.

Usage

WP.IDX(x, cmax, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2,
  gamma = (fzm^2*7)/4, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE)

Arguments

`x`	a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.
`cmax`	a maximum number of clusters to be considered.
`cmin`	a minimum number of clusters to be considered. The default is `2`.
`corr`	a character string indicating which correlation coefficient is to be computed (`"pearson"`, `"kendall"` or `"spearman"`). The default is `"pearson"`.
`method`	a character string indicating which clustering method to be used (`"FCM"` or `"EM"`). The default is `"FCM"`.
`fzm`	a number greater than 1 giving the degree of fuzzification for `method = "FCM"`. The default is `2`.
`gamma`	adjusted fuzziness parameter for `indexlist` = (`"WP"`, `"WPC"`, `"WPCI1"`, `"WPCI2"`). The default is computed from `7fzm^2/4`.
`sampling`	a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is `1`.
`iter`	a maximum number of iterations for `method = "FCM"`. The default is `100`.
`nstart`	a maximum number of initial random sets for FCM for `method = "FCM"`. The default is `20`.
`NCstart`	logical for `indexlist` = (`"WP"`, `"WPC"`, `"WPCI1"`,`"WPCI2"`), if `TRUE`, the WP correlation at c=1 is defined as an adjusted sd of the distances between all data points and their mean. Otherwise, the WP correlation at c=1 is defined as 0.

Details

The newly introduced index was inspired by the recently introduced Wiroonsri index which is only compatible with hard clustering methods.

The WPC computes the correlation between the actual distance between a pair of data points and the distance between adjusted centroids with respect to the pair. WPCI1 and WPCI2 are the proportion and the subtraction, respectively, of the same two ratios. The first ratio is the WPC improvement from c-1 clusters to c clusters over the entire room for improvement. The second ratio is the WPC improvement from c clusters to c+1 clusters over the entire room for improvement. WP is defined as a combination of WPCI1 and WPCI2.

The largest value of WP(c) indicates a valid optimal partition.

Value

WPC

the WP correlations for c from cmin-1 to cmax+1 shown in a data frame where the first and the second columns are c and the WPC, respectively.

Each of the followings show the value of each index for c from cmin to cmax in a data frame.

`WP`	the WP index.
`WPCI1`	the WPCI1 index.
`WPCI2`	the WPCI2 index.

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

N. Wiroonsri, O. Preedasawakul, "A correlation-based fuzzy cluster validity index with secondary options detector," arXiv:2308.14785, 2023

Examples

library(UniversalCVI)

# The data is from Wiroonsri (2024).
x = R1_data[,1:2]

# ---- FCM algorithm ----

# Compute all the indices by WP.IDX using default gamma
FCM.WP = WP.IDX(scale(x), cmax = 10, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2,
  iter = 100, nstart = 20, NCstart = TRUE)
print(FCM.WP$WP)

# The optimal number of cluster
FCM.WP$WP[which.max(FCM.WP$WP$WPI),]


# ---- EM algorithm ----

# Compute all the indices by WP.IDX using default gamma
EM.WP = WP.IDX(scale(x), cmax = 10, cmin = 2, corr = 'pearson', method = 'EM',
  iter = 100, nstart = 20, NCstart = TRUE)
print(EM.WP$WP)

# The optimal number of cluster
EM.WP$WP[which.max(EM.WP$WP$WPI),]

[Package UniversalCVI version 1.1.2 Index]