cclust {cclust} R Documentation

Convex Clustering

Description

The data given by x is clustered by an algorithm.

If centers is a matrix, its rows are taken as the initial cluster centers. If centers is an integer, centers rows of x are randomly chosen as initial values.

The algorithm stops, if no cluster center has changed during the last iteration or the maximum number of iterations (given by iter.max) is reached.

If verbose is TRUE, only for "kmeans" method, displays for each iteration the number of the iteration and the numbers of cluster indices which have changed since the last iteration is given.

If dist is "euclidean", the distance between the cluster center and the data points is the Euclidian distance (ordinary kmeans algorithm). If "manhattan", the distance between the cluster center and the data points is the sum of the absolute values of the distances of the coordinates.

If method is "kmeans", then we have the kmeans clustering method, which works by repeatedly moving all cluster centers to the mean of their Voronoi sets. If "hardcl" we have the On-line Update (Hard Competitive learning) method, which works by performing an update directly after each input signal, and if "neuralgas" we have the Neural Gas (Soft Competitive learning) method, that sorts for each input signal the units of the network according to the distance of their reference vectors to input signal.

If rate.method is "polynomial", the polynomial learning rate is used, that means 1/t, where t stands for the number of input data for which a particular cluster has been the winner so far. If "exponentially decaying", the exponential decaying learning rate is used according to par1*{(par2/par1)}^{(iter/itermax)} where par1 and par2 are the initial and final values of the learning rate.

The parameters rate.par of the learning rate, where if rate.method is "polynomial" then by default rate.par=1.0, otherwise rate.par=(0.5,1e-5).

Usage

cclust (x, centers, iter.max=100, verbose=FALSE, dist="euclidean",
method= "kmeans", rate.method="polynomial", rate.par=NULL)

Arguments

 x Data matrix where columns correspond to variables and rows to observations centers Number of clusters or initial values for cluster centers iter.max Maximum number of iterations verbose If TRUE, make some output during learning dist If "euclidean", then mean square error, if "manhattan ", the mean absolute error is used. method If "kmeans", then we have the kmeans clustering method, if "hardcl" we have the On-line Update (Hard Competitive learning) method, and if "neuralgas", we have the Neural Gas (Soft Competitive learning) method. Abbreviations of the method names are accepted. rate.method If "kmeans", then k-means learning rate, otherwise exponential decaying learning rate. It is used only for the Hardcl method. rate.par The parameters of the learning rate.

Value

cclust returns an object of class "cclust".

 centers The final cluster centers. initcenters The initial cluster centers. ncenters The number of the centers. cluster Vector containing the indices of the clusters where the data points are assigned to. size The number of data points in each cluster. iter The number of iterations performed. changes The number of changes performed in each iteration step with the Kmeans algorithm. dist The distance measure used. method The algorithm method being used. rate.method The learning rate being used by the Hardcl clustering method. rate.par The parameters of the learning rate. call Returns a call in which all of the arguments are specified by their names. withinss Returns the sum of square distances within the clusters.

predict.cclust

Examples

## a 2-dimensional example
x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
cl<-cclust(x,2,20,verbose=TRUE,method="kmeans")
plot(x, col=cl\$cluster)

## a 3-dimensional example
x<-rbind(matrix(rnorm(150,sd=0.3),ncol=3),
matrix(rnorm(150,mean=1,sd=0.3),ncol=3),
matrix(rnorm(150,mean=2,sd=0.3),ncol=3))
cl<-cclust(x,6,20,verbose=TRUE,method="kmeans")
plot(x, col=cl\$cluster)

## assign classes to some new data
y<-rbind(matrix(rnorm(33,sd=0.3),ncol=3),
matrix(rnorm(33,mean=1,sd=0.3),ncol=3),
matrix(rnorm(3,mean=2,sd=0.3),ncol=3))
ycl<-predict(cl, y)
plot(y, col=ycl\$cluster)

[Package cclust version 0.6-23 Index]