kmeans.run {bios2mds} | R Documentation |
Multiple runs of K-means analysis
Description
Performs multiple runs of K-means clustering and analyzes data.
Usage
kmeans.run(mat, nb.clus = 2, nb.run = 1000, iter.max = 10000,
method = "euclidean")
Arguments
mat |
a numeric matrix representing the coordinates of the elements after metric MDS analysis. |
nb.clus |
a numeric value indicating the number of clusters. Default is 2. |
nb.run |
a numeric value indicating the number of runs. Default is 1000. |
iter.max |
a numeric value indicating the maximum number of iterations for K-means. Default is 10000. |
method |
a string of characters to determine the distance to be used. This should be one of "euclidean", "maximum", "manhattan", "canberra", "binary", "pearson", "correlation", "spearman" or "kendall". Default is "euclidean". |
Details
The aim of K-means clustering is the partition of elements into a user-provided number of clusters. Several runs of K-means analysis on the same data may return different cluster assignments because the K-means procedure attributes random initial centroids for each run. The robustness of an assignment depends on its reproducibility.
The function matchClasses
from the e1071
package is used to compare the cluster assignments of the different runs and returns a score of agreement between them. The most frequent clustering solution is selected and used as a reference to assess the reproducibility of the analysis.
kmeans.run
returns two lists. In either list, the clusters refer to those observed in the most frequent solution. The first list provides, for each element, the relative ratio of its assignment to each cluster in the different runs. The second list provides, for each cluster, the list of the assigned elements along with the relative assignment to this cluster in the different runs.
Value
A object of class 'kmean', which is a named list of two elements
elements |
a named list of elements with the relative assignment of each element to each cluster. |
clusters |
a named list of clusters with the elements assigned to this cluster in the most frequent solution and their relative assignment to this cluster in multiple runs. |
Note
During the K-means procedure, an empty cluster can be obtained if no objects are allocated to the
cluster. In kmeans.run
, runs with empty clusters are discarded.
kmeans.run
requires Kmeans
and matchClasses
functions from amap
and e1071
packages, respectively.
Author(s)
Julien Pele
Examples
# Clustering human GPCRs in 4 groups with 100 runs of K-means
data(gpcr)
coord <- gpcr$mmds$sapiens.active$coord
kmeans.run1 <- kmeans.run(coord, nb.clus = 4, nb.run = 100)
kmeans.run1$clusters
kmeans.run1$elements