piv_KMeans {pivmet} | R Documentation |
k-means Clustering Using Pivotal Algorithms For Seeding
Description
Perform classical k-means clustering on a data matrix using pivots as initial centers.
Usage
piv_KMeans(
x,
centers,
alg.type = c("kmeans", "hclust"),
method = "average",
piv.criterion = c("MUS", "maxsumint", "minsumnoint", "maxsumdiff"),
H = 1000,
iter.max = 10,
nstart = 10,
prec_par = 10
)
Arguments
x |
A |
centers |
The number of groups for the the |
alg.type |
The clustering algorithm for the initial partition of the
|
method |
If |
piv.criterion |
The pivotal criterion used for identifying one pivot
for each group. Possible choices are: |
H |
The number of distinct |
iter.max |
If |
nstart |
If |
prec_par |
If |
Details
The function implements a modified version of k-means which aims at
improving the clustering solution starting from a careful seeding.
In particular, it performs a pivot-based initialization step
using pivotal methods to find the initial centers
for the clustering procedure. The starting point consists of multiple
runs of the classical k-means by selecting nstart>1
in the
kmeans
function,
with a fixed number of clusters
in order to build the co-association matrix of data units.
Value
A list with components
cluster |
A vector of integers indicating the cluster to which each point is allocated. |
centers |
A matrix of cluster centers (centroids). |
coass |
The co-association matrix built from ensemble clustering. |
pivots |
The pivotal units identified by the selected pivotal criterion. |
totss |
The total sum of squares. |
withinss |
The within-cluster sum of squares for each cluster. |
tot.withinss |
The within-cluster sum of squares summed across clusters. |
betwennss |
The between-cluster sum of squared distances. |
size |
The number of points in each cluster. |
iter |
The number of (outer) iterations. |
ifault |
integer: indicator of a possible algorithm problem (for experts). |
Author(s)
Leonardo Egidi legidi@units.it, Roberta Pappada
References
Egidi, L., Pappadà, R., Pauli, F., Torelli, N. (2018). K-means seeding via MUS algorithm. Conference Paper, Book of Short Papers, SIS2018, ISBN: 9788891910233.
Examples
# Data generated from a mixture of three bivariate Gaussian distributions
## Not run:
N <- 620
k <- 3
n1 <- 20
n2 <- 100
n3 <- 500
x <- matrix(NA, N,2)
truegroup <- c( rep(1,n1), rep(2, n2), rep(3, n3))
x[1:n1,] <- rmvnorm(n1, c(1,5), sigma=diag(2))
x[(n1+1):(n1+n2),] <- rmvnorm(n2, c(4,0), sigma=diag(2))
x[(n1+n2+1):(n1+n2+n3),] <- rmvnorm(n3, c(6,6), sigma=diag(2))
# Apply piv_KMeans with MUS as pivotal criterion
res <- piv_KMeans(x, k)
# Apply piv_KMeans with maxsumdiff as pivotal criterion
res2 <- piv_KMeans(x, k, piv.criterion ="maxsumdiff")
# Plot the data and the clustering solution
par(mfrow=c(1,2), pty="s")
colors_cluster <- c("grey", "darkolivegreen3", "coral")
colors_centers <- c("black", "darkgreen", "firebrick")
graphics::plot(x, col = colors_cluster[truegroup],
bg= colors_cluster[truegroup], pch=21, xlab="x[,1]",
ylab="x[,2]", cex.lab=1.5,
main="True data", cex.main=1.5)
graphics::plot(x, col = colors_cluster[res$cluster],
bg=colors_cluster[res$cluster], pch=21, xlab="x[,1]",
ylab="x[,2]", cex.lab=1.5,
main="piv_KMeans", cex.main=1.5)
points(x[res$pivots, 1], x[res$pivots, 2],
pch=24, col=colors_centers,bg=colors_centers,
cex=1.5)
points(res$centers, col = colors_centers[1:k],
pch = 8, cex = 2)
## End(Not run)