B_STRPBM.IDX {BayesCVI} | R Documentation |
BCVI-Starczewski and Pakhira-Bandyopadhyay-Maulik for crisp clustering indexes
Description
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using Starczewski (STR) and/or Pakhira-Bandyopadhyay-Maulik (PBM) as the underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
Usage
B_STRPBM.IDX(x, kmax, method = "kmeans", indexlist = "all",
nstart = 100, alpha = "default", mult.alpha = 1/2)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
indexlist |
a character string indicating which cluster validity indexes to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
Details
BCVI-STRPBM is defined as follows.
Let
r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}
where CVI is either STR or PBM index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
Value
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
M. K. Pakhira, S. Bandyopadhyay and U. Maulik, "Validity index for crisp and fuzzy clusters," Pattern Recogn 37(3):487–501 (2004).
A. Starczewski, "A new validity index for crisp clusters," Pattern Anal Applic 20, 687–700 (2017).
N. Wiroonsri, O. Preedasawakul, "A Bayesian cluster validity index", arXiv:2402.02162, 2024.
See Also
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
Examples
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.STRPBM = B_STRPBM.IDX(x = scale(data), kmax=10, method = "kmeans",
indexlist = "all", nstart = 100, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI-STR
pplot = plot_BCVI(B.STRPBM$STR)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
# plot the BCVI-PBM
pplot = plot_BCVI(B.STRPBM$PBM)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot