plot_BCVI {BayesCVI}R Documentation

Plots for visualizing BCVI

Description

Plot Bayesian cluster validity index (BCVI) with and without standard deviation error bars and the underlying index.

Usage

plot_BCVI(B.result, mult.err.bar = 2)

Arguments

B.result

a result from one of the functions B_XB.IDX, B_Wvalid, B_WP.IDX, B_WL.IDX, B_TANG.IDX, B_STRPBM.IDX, B_SF.IDX, B_PBM.IDX, B_PB.IDX, B_KWON.IDX, B_KWON2.IDX, B_KPBM.IDX, B_HF.IDX, B_GC.IDX, B_DI.IDX, B_DB.IDX, B_CSL.IDX, B_CH.IDX, B_CCV.IDX and B_BayesCVIs.IDX

mult.err.bar

a multiplier of the stadard deviations to be used for plotting error bars

Details

BCVI is defined as follows.

Let

rk(x)=maxjCVI(j)CVI(k)i=2K(maxjCVI(j)CVI(i))r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}

for a cluster validity index (CVI) such that the smallest value indicates the optimal number of clusters and

rk(x)=CVI(k)minjCVI(j)i=2K(CVI(i)minjCVI(j))r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}

for a CVI such that the largest indicates the optimal number of clusters. Assume that

f(xp)=C(p)k=2Kpknrk(x)f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}

represents the conditional probability density function of the dataset given p\bf p, where C(p)C({\bf p}) is the normalizing constant. Assume further that p{\bf p} follows a Dirichlet prior distribution with parameters α=(α2,,αK){\bm \alpha} = (\alpha_2,\ldots,\alpha_K). The posterior distribution of p\bf p still remains a Dirichlet distribution with parameters (α2+nr2(x),,αK+nrK(x))(\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x})).

The BCVI is then defined as

BCVI(k)=E[pkx]=αk+nrk(x)α0+nBCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}

where α0=k=2Kαk.\alpha_0 = \sum_{k=2}^K \alpha_k.

The variance of pkp_k can be computed as

Var(pkx)=(αk+nrk(x))(α0+nαknrk(x))(α0+n)2(α0+n+1).Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.

Value

plot_index

a plot of the underlying index for the number of groups from 22 to kmaxkmax according to B.result

plot_BCVI

a plot of BCVI for the number of groups from 22 to kmaxkmax according to B.result

error_bar_plot

a plot of BCVI with error bars for the number of groups from 22 to kmaxkmax according to B.result

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

N. Wiroonsri, O. Preedasawakul, "A Bayesian cluster validity index", arXiv:2402.02162, 2024.

See Also

B_STRPBM.IDX, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_WP.IDX, B_DB.IDX

Examples


library(BayesCVI)
library(UniversalCVI)

##Soft clustering

# The data included in this package.
data = B7_data[,1:2]

# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)

B.XB = B_XB.IDX(x = scale(data), kmax =10, method = "FCM", fzm = 2,
              nstart = 20, iter = 100, alpha = aalpha, mult.alpha = 1/2)

# plot the BCVI

pplot = plot_BCVI(B.XB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot


## Hard clustering

# The data included in this package.
data = B2_data[,1:2]

K.STR = STRPBM.IDX(scale(data), kmax = 10, kmin = 2, method = "kmeans",
  indexlist = "STR", nstart = 100)

# WP.IDX values
result = K.STR$STR$STR


aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.STR = BayesCVIs(CVI = result,
          n = nrow(data),
          kmax = 10,
          opt.pt = "max",
          alpha = aalpha,
          mult.alpha = 1/2)

# plot the BCVI

pplot = plot_BCVI(B.STR)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot


[Package BayesCVI version 1.0.0 Index]