B_WP.IDX {BayesCVI}R Documentation

BCVI-Wiroonsri and Preedasawakul (WP) index

Description

Compute Bayesian cluster validity index (BCVI) from two to kmax groups using Wiroonsri and Preedasawakul (WP) as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).

Usage

B_WP.IDX(x, kmax, corr = "pearson", method = "FCM", fzm = 2,
        gamma = (fzm^2 * 7)/4, sampling = 1, iter = 100, nstart = 20,
        NCstart = TRUE, alpha = "default", mult.alpha = 1/2)

Arguments

x

a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.

kmax

a maximum number of clusters to be considered.

corr

a character string indicating which correlation coefficient is to be computed ("pearson", "kendall" or "spearman"). The default is "pearson".

method

a character string indicating which clustering method to be used ("FCM" or "EM"). The default is "FCM".

fzm

a number greater than 1 giving the degree of fuzzification for method = "FCM". The default is 2.

gamma

adjusted fuzziness parameter for indexlist = ("WP", "WPC", "WPCI1", "WPCI2"). The default is computed from 7fzm^2/4.

sampling

a number greater than 0 and less than or equal to 1 indicating the undersampling proportion of data to be used. This argument is intended for handling a large dataset. The default is 1.

iter

a maximum number of iterations for method = "FCM". The default is 100.

nstart

a maximum number of initial random sets for FCM for method = "FCM". The default is 20.

NCstart

logical for indexlist = ("WP", "WPC", "WPCI1","WPCI2"), if TRUE, the WP correlation at c=1 is defined as an adjusted sd of the distances between all data points and their mean. Otherwise, the WP correlation at c=1 is defined as 0.

alpha

Dirichlet prior parameters \alpha_2,...,\alpha_k where \alpha_k is the parameter corresponding to "the probability of having k groups" (selecting each \alpha_k between 0 to 30 is recommended and using the other parameter mult.alpha to be its multiplier. The default is "default".)

mult.alpha

the power s from n^s to be multiplied to the Dirichlet prior parameters alpha (selecting mult.alpha in [0,1) is recommended). The default is \frac{1}{2}.

Details

BCVI-WP is defined as follows. Let

r_k(\bf x) = \dfrac{WP(k)-\min_j WP(j)}{\sum_{i=2}^K (WP(i)-\min_j WP(j))}

Assume that

f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}

represents the conditional probability density function of the dataset given \bf p, where C({\bf p}) is the normalizing constant. Assume further that {\bf p} follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K). The posterior distribution of \bf p still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x})).

The BCVI is then defined as

BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}

where \alpha_0 = \sum_{k=2}^K \alpha_k.

The variance of p_k can be computed as

Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.

Value

BCVI

the dataframe where the first and the second columns are the number of groups k and BCVI(k), respectively, for k from 2 to kmax.

VAR

the data frame where the first and the second columns are the number of groups k and the variance of p_k, respectively, for k from 2 to kmax.

CVI

the data frame where the first and the second columns are the number of groups k and the original WP(k), respectively, for k from 2 to kmax.

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

N. Wiroonsri, O. Preedasawakul, "A correlation-based fuzzy cluster validity index with secondary options detector," arXiv:2308.14785, 2023

N. Wiroonsri, O. Preedasawakul, "A Bayesian cluster validity index", arXiv:2402.02162, 2024.

See Also

B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX

Examples

library(BayesCVI)

# The data included in this package.
data = B7_data[,1:2]

# alpha
aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)

B.WP = B_WP.IDX(x = scale(data), kmax =10, corr = "pearson", method = "FCM",
                fzm = 2, sampling = 1, iter = 100, nstart = 20, NCstart = TRUE,
                alpha = aalpha, mult.alpha = 1/2)

# plot the BCVI

pplot = plot_BCVI(B.WP)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot

[Package BayesCVI version 1.0.0 Index]