R: Initialization of cluster prototypes using K-means++...

kmpp {inaparc}

R Documentation

Initialization of cluster prototypes using K-means++ algorithm

Description

Initializes the cluster prototypes matrix by using K-means++ algorithm which has been proposed by Arthur and Vassilvitskii (2007).

Usage

kmpp(x, k)

Arguments

`x`	a numeric vector, data frame or matrix.
`k`	an integer specifying the number of clusters.

Details

K-means++ (Arthur & Vassilvitskii, 2007) is usually reported as an efficient approximation algorithm in overcoming the poor clustering problem with the standard K-means algorithm. K-means++ is an algorithm that merges MacQueen's second method with the ‘Maximin’ method to initialize the cluster prototypes (Ji et al, 2015). K-means++ initializes the cluster centroids by finding the data objects that are farther away from each other in a probabilistic manner. In K-means++, the first cluster protoype (center) is randomly assigned. The prototypes of remaining clusters are determined with a probability of {md(x')}^2/\sum_{k=1}^{n} md({x_k})^2, where md(x) is the minimum distance between a data object and the previously computed prototypes.

The function kmpp is an implementation of the initialization algorithm of K-means++ that is based on the code‘k-meansp2.R’, authored by M. Sugiyama. It needs less execution time due to its vectorized distance computations.

Value

an object of class ‘inaparc’, which is a list consists of the following items:

`v`	a numeric matrix containing the initial cluster prototypes.
`ctype`	a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototypes are the objects selected by the algorithm.
`call`	a string containing the matched function call that generates this sQuoteinaparc object.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Arthur, D. & Vassilvitskii. S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. url:http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

M. Sugiyama, ‘mahito-sugiyama/k-meansp2.R’. url:https://gist.github.com/mahito-sugiyama/ef54a3b17fff4629f106

Examples

data(iris)
res <- kmpp(x=iris[,1:4], k=5)
v <- res$v
print(v)

[Package inaparc version 1.2.0 Index]