justClusters {ClassDiscovery}  R Documentation 
Unsupervised clustering algorithms, such as partitioning around medoids
(pam
), Kmeans (kmeans
), or
hierarchical clustering (hclust
) after cutting the tree,
produce a list of class assignments along with other structure. To
simplify the interface for the BootstrapClusterTest
and
PerturbationClusterTest
, we have written these routines
that simply extract these cluster assignments.
cutHclust(data, k, method = "average", metric = "pearson")
cutPam(data, k)
cutKmeans(data, k)
cutRepeatedKmeans(data, k, nTimes)
repeatedKmeans(data, k, nTimes)
data 
A numerical data matrix 
k 
The number of classes desired from the algorithm 
method 
Any valid linkage method that can be passed to the

metric 
Any valid distance metric that can be passed to the

nTimes 
An integer; the number of times to repeat the Kmeans algorithm with a different random starting point 
Each of the clustering routines used here has a different
structure for storing cluster assignments. The kmeans
function stores the assignments in a ‘cluster’ attribute. The
pam
function uses a ‘clustering’ attribute. For
hclust
, the assignments are produced by a call to the
cutree
function.
It has been observed that the Kmeans algorithm can converge to
different solutions depending on the starting values of the group
centers. We also include a routine (repeatedKmeans
) that runs
the Kmeans algorithm repeatedly, using different randomly generated
staring points each time, saving the best results.
Each of the cut...
functions returns a vector of integer values
representing the cluster assignments found by the algorithm.
The repeatedKmeans
function returns a list x
with three
components. The component x$kmeans
is the result of the call
to the kmeans
function that produced the best fit to the
data. The component x$centers
is a matrix containing the list
of group centers that were used in the best call to kmeans
.
The component x$withinss
contains the sum of the withingroup
sums of squares, which is used as the measure of fitness.
Kevin R. Coombes krc@silicovore.com
# simulate data from three different groups
d1 < matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
d2 < matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
d3 < matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
dd < cbind(d1, d2, d3)
cutKmeans(dd, k=3)
cutKmeans(dd, k=4)
cutHclust(dd, k=3)
cutHclust(dd, k=4)
cutPam(dd, k=3)
cutPam(dd, k=4)
cutRepeatedKmeans(dd, k=3, nTimes=10)
cutRepeatedKmeans(dd, k=4, nTimes=10)
# cleanup
rm(d1, d2, d3, dd)