ktaucentersfast {ktaucenters} | R Documentation |
ktaucentersfast
Description
Robust and efficient version of Kmeans algorithm for clustering based on centers.
Usage
ktaucentersfast(
x,
centers,
nstart = 1L,
use_kmeans = TRUE,
use_robin = TRUE,
max_iter = 100L,
max_tol = 1e-06,
cutoff = 0.999
)
Arguments
x |
numeric matrix of size n x p, or an object that can be coerced to a matrix (such as a numeric vector or a data frame with all numeric columns). |
centers |
either the number of clusters, say k, or a matrix of initial
(distinct) cluster centers. If a number, a random set of distinct rows in |
nstart |
if centers is a number, how many random sets should be chosen? |
use_kmeans |
use kmeans centers as starting point? |
use_robin |
use robin algorithm centers as starting point? |
max_iter |
the maximum number of iterations allowed. |
max_tol |
maximum tolerance parameter used for the algorithm as stopping rule. |
cutoff |
quantile of chi-square distribution to be used as a threshold for outliers detection, defaults to 0.999. |
Value
A list with the following components:
centers |
: A matrix of cluster centers. |
cluster |
: A vector of integer (from 1:k) indicating the cluster to which each point is allocated. |
tau |
: |
iter |
: Number of iterations until convergence is achieved or maximum number of iteration reached. |
di |
: Distance of each observation to its assigned cluster-center |
outliers |
: A vector of integers with indices for each observation considered as outlier. |
References
Gonzalez, J. D., Yohai, V. J., & Zamar, R. H. (2019). Robust Clustering Using Tau-Scales. arXiv preprint arXiv:1906.08198.
Examples
# Generate synthetic data (three clusters well separated)
Z <- rnorm(600)
mues <- rep(c(-3, 0, 3), 200)
X <- matrix(Z + mues, ncol = 2)
# Generate 60 synthetic outliers (contamination level 20%)
X[sample(1:300,60), ] <- matrix(runif( 40, 3 * min(X), 3 * max(X) ),
ncol = 2, nrow = 60)
robust <- ktaucentersfast(
X, centers = X[sample(1:300, 3), ],
max_tol = 1e-3, max_iter = 100)
oldpar <- par(mfrow = c(1, 2))
plot(X,type = "n", main = "ktaucenters (Robust) \n outliers: solid black dots")
points(X[robust$cluster == 1, ], col = 2)
points(X[robust$cluster == 2, ], col = 3)
points(X[robust$cluster == 3, ], col = 4)
points(X[robust$outliers, 1], X[robust$outliers, 2], pch = 19)
# Classical (non Robust) algorithm
non_robust <- kmeans(X, centers = 3, nstart = 100)
plot(X, type = "n", main = "kmeans (Classical)")
points(X[non_robust$cluster == 1, ], col = 2)
points(X[non_robust$cluster == 2, ], col = 3)
points(X[non_robust$cluster == 3, ], col = 4)
par(oldpar)