scseek2 {inaparc}R Documentation

Initialization of cluster prototypes using SCS algorithm over a selected feature

Description

Initializes the cluster prototypes matrix with the Simple Cluster Seeking (SCS) algorithm (Tou & Gonzales, 1974) over a selected feature.

Usage

scseek2(x, k, sfidx, tv)

Arguments

x

a numeric vector, data frame or matrix.

k

an integer for the number of clusters.

sfidx

an integer specifying the column index of the selected feature for random sampling. If missing, it is internally determined by comparing the coefficients of variation of all features in the data set. The feature having the maximum coefficent of variation is used as the selected feature.

tv

a number to be used as the threshold distance which is directly input by the user. Also it is possible to compute T, a threshold distance value with the following options of tv argument:

  • T is the mean of differences between the consecutive pairs of objects with the option cd1.

  • T is the minimum of differences between the consecutive pairs of objects with the option cd2.

  • T is the mean of Euclidean distances between the consecutive pairs of objects divided into k with the option md. This is the default if tv is not supplied by the user.

  • T is the range of maximum and minimum of Euclidean distances between the consecutive pairs of objects divided into k with the option mm.

Details

The scseek2 is a novel variant of the function scseek based on the Simple Cluster Seeking (SCS) algorithm (Tou & Gonzales, 1974). It differs from SCS that the distances and threshold value are computed over a selected feature having the maximum coefficient of variation, instead of using all the features.

Value

an object of class ‘inaparc’, which is a list consists of the following items:

v

a numeric matrix of the initial cluster prototypes.

sfidx

an integer for the column index of the selected feature, which used for random sampling.

ctype

a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototype matrix contains the sampled objects.

call

a string containing the matched function call that generates this ‘proclus’ object.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Tou, J.T. & Gonzalez,R.C. (1974). Pattern Recognition Principles. Addison-Wesley, Reading, MA. <ISBN:9780201075861>

See Also

aldaoud, ballhall, crsamp, firstk, forgy, hartiganwong, inofrep, inscsf, insdev, kkz, kmpp, ksegments, ksteps, lastk, lhsmaximin, lhsrandom, maximin, mscseek, rsamp, rsegment, scseek, spaeth, ssamp, topbottom, uniquek, ursamp

Examples

data(iris)
# Run over 4th feature with the threshold value of 0.5
res <- scseek2(x=iris[,1:4], k=5, sfidx=4, tv=0.5)
v1 <- res$v
print(v1)

# Run with the internally computed default threshold value 
res <- scseek2(x=iris[,1:4], k=5)
v2 <- res$v
print(v2)


[Package inaparc version 1.2.0 Index]