kmeans {clustlearn} | R Documentation |
K-Means Clustering
Description
Perform K-Means clustering on a data matrix.
Usage
kmeans(
data,
centers,
max_iterations = 10,
initialization = "kmeans++",
details = FALSE,
waiting = TRUE,
...
)
Arguments
data |
a set of observations, presented as a matrix-like object where every row is a new observation. |
centers |
either the number of clusters or a set of initial cluster
centers. If a number, the centers are chosen according to the
|
max_iterations |
the maximum number of iterations allowed. |
initialization |
the initialization method to be used. This should be
one of |
details |
a Boolean determining whether intermediate logs explaining how the algorithm works should be printed or not. |
waiting |
a Boolean determining whether the intermediate logs should be printed in chunks waiting for user input before printing the next or not. |
... |
additional arguments passed to |
Details
The data given by data
is clustered by the k
-means
method, which aims to partition the points into k
groups such that the
sum of squares from points to the assigned cluster centers is minimized. At
the minimum, all cluster centers are at the mean of their Voronoi sets (the
set of data points which are nearest to the cluster center).
The k
-means method follows a 2 to n
step process:
The first step can be subdivided into 3 steps:
Selection of the number
k
of clusters, into which the data is going to be grouped and of which the centers will be the representatives. This is determined through the use of thecenters
parameter.Computation of the distance from each data point to each center.
Assignment of each observation to a cluster. The observation is assigned to the cluster represented by the nearest center.
The next steps are just like the first but for the first sub-step:
Computation of the new centers. The center of each cluster is computed as the mean of the observations assigned to said cluster.
The algorithm stops once the centers in step n+1
are the same as the
ones in step n
. However, this convergence does not always take place.
For this reason, the algorithm also stops once a maximum number of iterations
max_iterations
is reached.
The initialization
methods provided by this function are:
random
A set of
centers
observations is chosen at random from the data as the initial centers.kmeans++
The
centers
observations are chosen using the kmeans++ algorithm. This algorithm chooses the first center at random and then chooses the next center from the remaining observations with probability proportional to the square distance to the closest center. This process is repeated untilcenters
centers are chosen.
Value
A stats::kmeans()
object.
Author(s)
Eduardo Ruiz Sabajanes, eduardo.ruizs@edu.uah.es
Examples
### Voronoi tesselation
voronoi <- suppressMessages(suppressWarnings(require(deldir)))
cols <- c(
"#00000019",
"#DF536B19",
"#61D04F19",
"#2297E619",
"#28E2E519",
"#CD0BBC19",
"#F5C71019",
"#9E9E9E19"
)
### Helper function
test <- function(db, k) {
print(cl <- clustlearn::kmeans(db, k, 100))
plot(db, col = cl$cluster, asp = 1, pch = 20)
points(cl$centers, col = seq_len(k), pch = 13, cex = 2, lwd = 2)
if (voronoi) {
x <- c(min(db[, 1]), max(db[, 1]))
dx <- c(x[1] - x[2], x[2] - x[1])
y <- c(min(db[, 2]), max(db[, 2]))
dy <- c(y[1] - y[2], y[2] - y[1])
tesselation <- deldir(
cl$centers[, 1],
cl$centers[, 2],
rw = c(x + dx, y + dy)
)
tiles <- tile.list(tesselation)
plot(
tiles,
asp = 1,
add = TRUE,
showpoints = FALSE,
border = "#00000000",
fillcol = cols
)
}
}
### Example 1
test(clustlearn::db1, 2)
### Example 2
test(clustlearn::db2, 2)
### Example 3
test(clustlearn::db3, 3)
### Example 4
test(clustlearn::db4, 3)
### Example 5
test(clustlearn::db5, 3)
### Example 6
test(clustlearn::db6, 3)
### Example 7 (with explanations, no plots)
cl <- clustlearn::kmeans(
clustlearn::db5[1:20, ],
3,
details = TRUE,
waiting = FALSE
)