kmpp {inaparc} | R Documentation |
Initialization of cluster prototypes using K-means++ algorithm
Description
Initializes the cluster prototypes matrix by using K-means++ algorithm which has been proposed by Arthur and Vassilvitskii (2007).
Usage
kmpp(x, k)
Arguments
x |
a numeric vector, data frame or matrix. |
k |
an integer specifying the number of clusters. |
Details
K-means++ (Arthur & Vassilvitskii, 2007) is usually reported as an efficient approximation algorithm in overcoming the poor clustering problem with the standard K-means algorithm. K-means++ is an algorithm that merges MacQueen's second method with the ‘Maximin’ method to initialize the cluster prototypes (Ji et al, 2015). K-means++ initializes the cluster centroids by finding the data objects that are farther away from each other in a probabilistic manner. In K-means++, the first cluster protoype (center) is randomly assigned. The prototypes of remaining clusters are determined with a probability of {md(x')}^2/\sum_{k=1}^{n} md({x_k})^2
, where md(x)
is the minimum distance between a data object and the previously computed prototypes.
The function kmpp
is an implementation of the initialization algorithm of K-means++ that is based on the code‘k-meansp2.R’, authored by M. Sugiyama. It needs less execution time due to its vectorized distance computations.
Value
an object of class ‘inaparc’, which is a list consists of the following items:
v |
a numeric matrix containing the initial cluster prototypes. |
ctype |
a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototypes are the objects selected by the algorithm. |
call |
a string containing the matched function call that generates this sQuoteinaparc object. |
Author(s)
Zeynel Cebeci, Cagatay Cebeci
References
Arthur, D. & Vassilvitskii. S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. url:http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
M. Sugiyama, ‘mahito-sugiyama/k-meansp2.R’. url:https://gist.github.com/mahito-sugiyama/ef54a3b17fff4629f106
See Also
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
hartiganwong
,
inofrep
,
inscsf
,
insdev
,
kkz
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsamp
,
rsegment
,
scseek
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
Examples
data(iris)
res <- kmpp(x=iris[,1:4], k=5)
v <- res$v
print(v)