cclust {flexclust} | R Documentation |
Convex Clustering
Description
Perform k-means clustering, hard competitive learning or neural gas on a data matrix.
Usage
cclust(x, k, dist = "euclidean", method = "kmeans",
weights=NULL, control=NULL, group=NULL, simple=FALSE,
save.data=FALSE)
Arguments
x |
A numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
k |
Either the number of clusters, or a vector of cluster
assignments, or a matrix of initial
(distinct) cluster centroids. If a number, a random set of (distinct)
rows in |
dist |
Distance measure, one of |
method |
Clustering algorithm: one of |
weights |
An optional vector of weights for the observations
(rows of the |
control |
An object of class |
group |
Currently ignored. |
simple |
Return an object of class |
save.data |
Save a copy of |
Details
This function uses the same computational engine as the earlier
function of the same name from package ‘cclust’. The main difference
is that it returns an S4 object of class "kcca"
, hence all
available methods for "kcca"
objects can be used. By default
kcca
and cclust
use exactly the same algorithm,
but cclust
will usually be much faster because it uses compiled
code.
If dist
is "euclidean"
, the distance between the cluster
center and the data points is the Euclidian distance (ordinary kmeans
algorithm), and cluster means are used as centroids.
If "manhattan"
, the distance between the cluster
center and the data points is the sum of the absolute values of the
distances, and the column-wise cluster medians are used as centroids.
If method
is "kmeans"
, the classic kmeans algorithm as
given by MacQueen (1967) is
used, which works by repeatedly moving all cluster
centers to the mean of their respective Voronoi sets. If
"hardcl"
,
on-line updates are used (AKA hard competitive learning), which work by
randomly drawing an observation from x
and moving the closest
center towards that point (e.g., Ripley 1996). If
"neuralgas"
then the neural gas algorithm by Martinetz et al
(1993) is used. It is similar to hard competitive learning, but in
addition to the closest centroid also the second closest centroid is
moved in each iteration.
Value
An object of class "kcca"
.
Author(s)
Evgenia Dimitriadou and Friedrich Leisch
References
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, eds L. M. Le Cam & J. Neyman, 1, pp. 281–297. Berkeley, CA: University of California Press.
Martinetz T., Berkovich S., and Schulten K (1993). ‘Neural-Gas’ Network for Vector Quantization and its Application to Time-Series Prediction. IEEE Transactions on Neural Networks, 4 (4), pp. 558–569.
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.
See Also
Examples
## a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd=0.3), ncol=2),
matrix(rnorm(100, mean=1, sd=0.3), ncol=2))
cl <- cclust(x,2)
plot(x, col=predict(cl))
points(cl@centers, pch="x", cex=2, col=3)
## a 3-dimensional example
x <- rbind(matrix(rnorm(150, sd=0.3), ncol=3),
matrix(rnorm(150, mean=2, sd=0.3), ncol=3),
matrix(rnorm(150, mean=4, sd=0.3), ncol=3))
cl <- cclust(x, 6, method="neuralgas", save.data=TRUE)
pairs(x, col=predict(cl))
plot(cl)