PB.IDX {UniversalCVI} | R Documentation |
Point biserial correlation (PB)
Description
Computes the PB (G. W. Miligan, 1980) index for a result either kmeans or hierarchical clustering from user specified kmin
to kmax
.
Usage
PB.IDX(x, kmax, kmin = 2, method = "kmeans", corr = "pearson", nstart = 100)
Arguments
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
kmin |
a minimum number of clusters to be considered. The default is |
method |
a character string indicating which clustering method to be used ( |
corr |
a character string indicating which correlation coefficient is to be computed ( |
nstart |
a maximum number of initial random sets for kmeans for |
Details
The largest value of PB(k)
indicates a valid optimal partition.
Value
PB |
the PB index for |
Author(s)
Nathakhun Wiroonsri and Onthada Preedasawakul
References
G. W. Miligan, "An examination of the effect of six types of error perturbation on fifteen clustering algorithms," Psychometrika, 45, 325-342 (1980).
See Also
Hvalid, Wvalid, DI.IDX, FzzyCVIs, R1_data
Examples
library(UniversalCVI)
# The data is from Wiroonsri (2024).
x = R1_data[,1:2]
# ---- Kmeans ----
# Compute PB index
K.PB = PB.IDX(scale(x), kmax = 15, kmin = 2, method = "kmeans",
corr = "pearson", nstart = 100)
print(K.PB)
# The optimal number of cluster
K.PB[which.max(K.PB$PB),]
# ---- Hierarchical ----
# Average linkage
# Compute PB index
H.PB = PB.IDX(scale(x), kmax = 15, kmin = 2, method = "hclust_average",
corr = "pearson")
print(H.PB)
# The optimal number of cluster
H.PB[which.max(H.PB$PB),]