rsamp {inaparc}R Documentation

Initialization of cluster prototypes using simple random sampling

Description

Initializes the cluster prototypes matrix using the randomly selected k objects from the data set.

Usage

rsamp(x, k)

Arguments

x

a numeric vector, data frame or matrix.

k

an integer for the number of clusters.

Details

The function rsamp generates a protoype matrix using the k objects which are randomly sampled from the data set without replacement. Simple random sampling (SRS), also so-called the second method of MacQueen in the clustering context, assumes that cluster areas have a high density; in consequence, the good candidates of the cluster prototypes can be sampled from these dense regions of data with a higher chance (Celebi et al, 2013). SRS is probably the most common approach to initialize prototype matrices. So, it can be seen a de facto standard because it has been widely applied with the basic K-means algorithm for the years. Since SRS has no rule to avoid to select the outliers or the objects close to each other, it may result with no good initializations. Before initialization of SRS, multivariate outliers removal on the data set as a data pre-processing step may be helpful to avoid for selection of the outliers, but increases the computational cost.

Value

an object of class ‘inaparc’, which is a list consists of the following items:

v

a numeric matrix containing the initial cluster prototypes.

ctype

a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because it samples the objects only.

call

a string containing the matched function call that generates this ‘inaparc’ object.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations, in Proc. of 5-th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297. url:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf

Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf

See Also

aldaoud, ballhall, crsamp, firstk, forgy, hartiganwong, inofrep, inscsf, insdev, kkz, kmpp, ksegments, ksteps, lastk, lhsmaximin, lhsrandom, maximin, mscseek, rsegment, scseek, scseek2, spaeth, ssamp, topbottom, uniquek, ursamp

Examples

data(iris)
res <- rsamp(x=iris[,1:4], k=5)
v <- res$v
print(v)

[Package inaparc version 1.2.0 Index]