ktaucenters {ktaucenters} | R Documentation |
ktaucenters
Description
Robust and efficient version of Kmeans algorithm for clustering based on centers.
Usage
ktaucenters(
X,
K,
centers = NULL,
tolmin = 1e-06,
NiterMax = 100,
nstart = 1,
startWithKmeans = TRUE,
startWithROBINPD = TRUE,
cutoff = 0.999
)
Arguments
X |
numeric matrix of size n x p. |
K |
number of clusters. |
centers |
a matrix of size K x p containing the K initial centers,
one at each matrix-row. If centers is NULL a random set of (distinct) rows in
|
tolmin |
a tolerance parameter used for the algorithm stopping rule. |
NiterMax |
a maximum number of iterations used for the algorithm stopping rule. |
nstart |
the number of trials that the base algorithm is run.
If it is greater than 1 and centers is not set as NULL, a random set of (distinct)
rows
in |
startWithKmeans |
if positive (or true) kmeans estimated centers are included as starting point. |
startWithROBINPD |
if positive (or true) ROBINDEN estimated centers are included as starting point. |
cutoff |
optional argument for outliers detection - quantiles of chi-square to be used as a threshold for outliers detection, defaults to 0.999. |
Value
A list with the following components:
centers |
: Matrix of size K x p with the estimated K centers. |
cluster |
: A vector of integer (from 1:K) indicating the cluster to which each point is allocated. |
iter |
: Number of iterations until convergence is achieved or maximum number of iterations reached. |
di |
: Distance of each observation to its assigned cluster-center. |
outliers |
: A vector of integers with indices for each observation considered as outlier. |
References
Gonzalez, J. D., Yohai, V. J., & Zamar, R. H. (2019). Robust Clustering Using Tau-Scales. arXiv preprint arXiv:1906.08198.
Examples
# Generate synthetic data (three clusters well separated)
Z <- rnorm(600)
mues <- rep(c(-3, 0, 3), 200)
X <- matrix(Z + mues, ncol = 2)
# Generate 60 synthetic outliers (contamination level 20%)
X[sample(1:300,60), ] <- matrix(runif( 40, 3 * min(X), 3 * max(X) ),
ncol = 2, nrow = 60)
robust <- ktaucenters(
X, K = 3, centers = X[sample(1:300, 3), ],
tolmin = 1e-3, NiterMax = 100)
oldpar <- par(mfrow = c(1, 2))
plot(X,type = "n", main = "ktaucenters (Robust) \n outliers: solid black dots")
points(X[robust$cluster == 1, ], col = 2)
points(X[robust$cluster == 2, ], col = 3)
points(X[robust$cluster == 3, ], col = 4)
points(X[robust$outliers, 1], X[robust$outliers, 2], pch = 19)
# Classical (non Robust) algorithm
non_robust <- kmeans(X, centers = 3, nstart = 100)
plot(X, type = "n", main = "kmeans (Classical)")
points(X[non_robust$cluster == 1, ], col = 2)
points(X[non_robust$cluster == 2, ], col = 3)
points(X[non_robust$cluster == 3, ], col = 4)
par(oldpar)