ktaucenters {ktaucenters}R Documentation

ktaucenters

Description

Robust and efficient version of Kmeans algorithm for clustering based on centers.

Usage

ktaucenters(
  X,
  K,
  centers = NULL,
  tolmin = 1e-06,
  NiterMax = 100,
  nstart = 1,
  startWithKmeans = TRUE,
  startWithROBINPD = TRUE,
  cutoff = 0.999
)

Arguments

X

numeric matrix of size n x p.

K

number of clusters.

centers

a matrix of size K x p containing the K initial centers, one at each matrix-row. If centers is NULL a random set of (distinct) rows in X are chosen as the initial centers.

tolmin

a tolerance parameter used for the algorithm stopping rule.

NiterMax

a maximum number of iterations used for the algorithm stopping rule.

nstart

the number of trials that the base algorithm is run. If it is greater than 1 and centers is not set as NULL, a random set of (distinct) rows in X will be chosen as the initial centers.

startWithKmeans

if positive (or true) kmeans estimated centers are included as starting point.

startWithROBINPD

if positive (or true) ROBINDEN estimated centers are included as starting point.

cutoff

optional argument for outliers detection - quantiles of chi-square to be used as a threshold for outliers detection, defaults to 0.999.

Value

A list with the following components:

centers

: Matrix of size K x p with the estimated K centers.

cluster

: A vector of integer (from 1:K) indicating the cluster to which each point is allocated.

iter

: Number of iterations until convergence is achieved or maximum number of iterations reached.

di

: Distance of each observation to its assigned cluster-center.

outliers

: A vector of integers with indices for each observation considered as outlier.

References

Gonzalez, J. D., Yohai, V. J., & Zamar, R. H. (2019). Robust Clustering Using Tau-Scales. arXiv preprint arXiv:1906.08198.

Examples

# Generate synthetic data (three clusters well separated)
Z <- rnorm(600)
mues <- rep(c(-3, 0, 3), 200)
X <- matrix(Z + mues, ncol = 2)

# Generate 60 synthetic outliers (contamination level 20%)
X[sample(1:300,60), ] <- matrix(runif( 40, 3 * min(X), 3 * max(X) ),
                                ncol = 2, nrow = 60)

robust <- ktaucenters(
     X, K = 3, centers = X[sample(1:300, 3), ],
     tolmin = 1e-3, NiterMax = 100)

oldpar <- par(mfrow = c(1, 2))

plot(X,type = "n", main = "ktaucenters (Robust) \n outliers: solid black dots")
points(X[robust$cluster == 1, ], col = 2)
points(X[robust$cluster == 2, ], col = 3)
points(X[robust$cluster == 3, ], col = 4)
points(X[robust$outliers, 1], X[robust$outliers, 2], pch = 19)

# Classical (non Robust) algorithm
non_robust <- kmeans(X, centers = 3, nstart = 100)

plot(X, type = "n", main = "kmeans (Classical)")
points(X[non_robust$cluster == 1, ], col = 2)
points(X[non_robust$cluster == 2, ], col = 3)
points(X[non_robust$cluster == 3, ], col = 4)

par(oldpar)

[Package ktaucenters version 1.0.0 Index]