scseek {inaparc} | R Documentation |
Initialization of cluster prototypes using SCS algorithm
Description
Initializes the cluster prototypes matrix with the Simple Cluster Seeking (SCS) algorithm (Tou & Gonzales, 1974).
Usage
scseek(x, k, tv)
Arguments
x |
a numeric vector, data frame or matrix. |
k |
an integer for the number of clusters. |
tv |
a number to be used as the threshold distance which is directly input by the user. Also it is possible to compute T, a threshold distance value with the following options of
|
Details
The algorithm Simple Cluster Seeking (SCS) (Tou & Gonzales, 1974) is similar to Ball and Hall's algorithm (Ball & Hall, 1967) with an exception for selection of the first object (Celebi et al, 2013). In SCS, the first object in the data set is selected as the prototype of the first cluster. Then, the next object whose distance to the first prototype is greater than T, a threshold distance value is seeked and assigned as the second cluster prototype, if found. Afterwards, the next object whose distance to already determined prototypes is greater than T is searched and assigned as the third cluster prototype. The selection process is repeated for determining the prototypes of remaining clusters in similar way.
Because SCS is sensitive to the order of the data (Celebi et al, 2013), it may not yield good initializations with the sorted data. On the other hand, the distance between the cluster prototypes can be controlled T, which is an arbitrary number specified by the user. But the problem is that how the user decides on this threshold value. As a solution to this problem in the function scseek
, some internally computed distance measures can be used. (See the section‘Arguments’ above for the available options.)
Value
an object of class ‘inaparc’, which is a list consists of the following items:
v |
a numeric matrix of the initial cluster prototypes. |
ctype |
a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototype matrix contains the objects. |
call |
a string containing the matched function call that generates this ‘inaparc’ object. |
Author(s)
Zeynel Cebeci, Cagatay Cebeci
References
Ball, G.H. & Hall, D.J. (1967). A clustering technique for summarizing multivariate data, Systems Res. & Behavioral Sci., 12 (2): 153-155.
Tou, J.T. & Gonzalez,R.C. (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA. <ISBN:9780201075861>
Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf
See Also
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
hartiganwong
,
inofrep
,
inscsf
,
insdev
,
kkz
,
kmpp
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsamp
,
rsegment
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
Examples
data(iris)
# Run with the threshold value of 0.5
res <- scseek(x=iris[,1:4], k=5, tv=0.5)
v1 <- res$v
print(v1)
# Run with the internally computed default threshold value
res <- scseek(x=iris[,1:4], k=5)
v2 <- res$v
print(v2)