mscseek {inaparc}R Documentation

Initialization of cluster prototypes using the modified SCS algorithm

Description

Initializes the cluster prototypes matrix using a modified version of the Simple Cluster Seeking (SCS) algorithm proposed by Tou & Gonzales(1974). While SCS uses a fixed threshold distance value T for selecting all candidates of clusters, the modified SCS recomputes T with the average Euclidean distances between the previously determined prototypes. This adjustment makes possible to select more cluster prototypes when compared to SCS.

Usage

mscseek(x, k, tv)

Arguments

x

a numeric vector, data frame or matrix.

k

an integer for the number of clusters.

tv

a number to be used as the threshold distance which is directly input by the user. Also it is possible to compute T, a threshold distance value with the following options of tv argument:

  • T is the mean of differences between the consecutive pairs of objects with the option cd1.

  • T is the minimum of differences between the consecutive pairs of objects with the option cd2.

  • T is the mean of Euclidean distances between the consecutive pairs of objects divided into k with the option md. This is the default if tv is not supplied by the user.

  • T is the range of maximum and minimum of Euclidean distances between the consecutive pairs of objects divided into k with the option mm.

Details

This is a modification of the Simple Cluster Seeking (SCS) algorithm (Tou & Gonzalez, 1974). The algorithm selects the first object in the data set as the prototype of the first cluster. Then, next object whose distance to the first prototype is greater than a threshold distance value is searched and assigned as the second cluster prototype. Instead of using a fixed the T, threshold distance value as SCS does, the modified SCS recomputes the T by the average Euclidean distances between the previously determined prototypes of clusters. The next object whose distance to the previously selected object is greater than the adjusted T is searched and assigned as the third cluster prototype. The selection process is repeated for the remaining clusters in similar way. The method is sensitive to the order of the data, it may not yield good initializations with the ordered data.

Value

an object of class ‘inaparc’, which is a list consists of the following items:

v

a numeric matrix of the initial cluster prototypes.

ctype

a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototype matrix contains the objects.

call

a string containing the matched function call that generates the object ‘inaparc’.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Tou, J.T. & Gonzalez, R.C. (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA. <ISBN:9780201075861>

See Also

aldaoud, ballhall, crsamp, firstk, forgy, hartiganwong, inofrep, inscsf, insdev, kkz, kmpp, ksegments, ksteps, lastk, lhsmaximin, lhsrandom, maximin, rsamp, rsegment, scseek, scseek2, spaeth, ssamp, topbottom, uniquek, ursamp

Examples

data(iris)
# Run with the threshold value of 0.1
res <- mscseek(x=iris[,1:4], k=5, tv=0.1)
v1 <- res$v
print(v1)

# Run with the internally computed default threshold value 
res <- mscseek(x=iris[,1:4], k=5)
v2 <- res$v
print(v2)

[Package inaparc version 1.2.0 Index]