ballhall {inaparc} | R Documentation |
Initialization of cluster prototypes using Ball & Hall's algorithm
Description
Initializes the prototypes of clusters by using the cluster seeding algorithm which has been proposed by Ball & Hall (1967).
Usage
ballhall(x, k, tv)
Arguments
x |
a numeric vector, data frame or matrix. |
k |
an integer specifying the number of clusters. |
tv |
a number to be used as T, a threshold distance value. It is directly input by the user. Also it is possible to compute T with the following options of
|
Details
In the Ball and Hall's algorithm (Ball & Hall, 1967), the center of gravity of data is assigned as the prototype of first cluster. It then passes the data objects in arbitrary order and takes an object as the next prototype if it is T units far from the previously selected prototypes. The purpose of using T, the distance threshold, is to make the cluster protoypes at least T units away from each other. Ball & Hall's method may be sensitive to the order of data, and moreover, deciding for an appropriate value of T is is also difficult (Celebi et al, 2013). As the solutions to this problem, the function ballhall
in this package computes a T value using some distance measures, if it is not specified by the user (for details, see the section ‘Arguments’ above.)
Value
an object of class ‘inaparc’, which is a list consists of the following items:
v |
a numeric matrix containing the initial cluster prototypes. |
ctype |
a string for the type of used centroid. It is ‘obj’ with this function because the created cluster prototypes matrix contains the selected objects. |
call |
a string containing the matched function call that generates this ‘inaparc’ object. |
Author(s)
Zeynel Cebeci, Cagatay Cebeci
References
Ball, G.H. & Hall, D.J. (1967). A clustering technique for summarizing multivariate data, Systems Res. & Behavioral Sci., 12 (2): 153-155.
Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf
See Also
aldaoud
,
crsamp
,
firstk
,
hartiganwong
,
inofrep
,
inscsf
,
insdev
,
kkz
,
kmpp
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsamp
,
rsegment
,
scseek
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
,
Examples
data(iris)
# Run with a user described threshold value
v1 <- ballhall(x=iris[,1:4], k=5, tv=0.6)$v
print(v1)
# Run with the internally computed default threshold value
v2 <- ballhall(x=iris[,1:4], k=5)$v
print(v2)