rsamp {inaparc} | R Documentation |
Initialization of cluster prototypes using simple random sampling
Description
Initializes the cluster prototypes matrix using the randomly selected k objects from the data set.
Usage
rsamp(x, k)
Arguments
x |
a numeric vector, data frame or matrix. |
k |
an integer for the number of clusters. |
Details
The function rsamp
generates a protoype matrix using the k objects which are randomly sampled from the data set without replacement. Simple random sampling (SRS), also so-called the second method of MacQueen in the clustering context, assumes that cluster areas have a high density; in consequence, the good candidates of the cluster prototypes can be sampled from these dense regions of data with a higher chance (Celebi et al, 2013). SRS is probably the most common approach to initialize prototype matrices. So, it can be seen a de facto standard because it has been widely applied with the basic K-means algorithm for the years. Since SRS has no rule to avoid to select the outliers or the objects close to each other, it may result with no good initializations. Before initialization of SRS, multivariate outliers removal on the data set as a data pre-processing step may be helpful to avoid for selection of the outliers, but increases the computational cost.
Value
an object of class ‘inaparc’, which is a list consists of the following items:
v |
a numeric matrix containing the initial cluster prototypes. |
ctype |
a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because it samples the objects only. |
call |
a string containing the matched function call that generates this ‘inaparc’ object. |
Author(s)
Zeynel Cebeci, Cagatay Cebeci
References
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations, in Proc. of 5-th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297. url:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf
Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf
See Also
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
hartiganwong
,
inofrep
,
inscsf
,
insdev
,
kkz
,
kmpp
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsegment
,
scseek
,
scseek2
,
spaeth
,
ssamp
,
topbottom
,
uniquek
,
ursamp
Examples
data(iris)
res <- rsamp(x=iris[,1:4], k=5)
v <- res$v
print(v)