ktaucentersfast {ktaucenters}R Documentation

ktaucentersfast

Description

Robust and efficient version of Kmeans algorithm for clustering based on centers.

Usage

ktaucentersfast(
  x,
  centers,
  nstart = 1L,
  use_kmeans = TRUE,
  use_robin = TRUE,
  max_iter = 100L,
  max_tol = 1e-06,
  cutoff = 0.999
)

Arguments

x

numeric matrix of size n x p, or an object that can be coerced to a matrix (such as a numeric vector or a data frame with all numeric columns).

centers

either the number of clusters, say k, or a matrix of initial (distinct) cluster centers. If a number, a random set of distinct rows in x is chosen as the initial centers.

nstart

if centers is a number, how many random sets should be chosen?

use_kmeans

use kmeans centers as starting point?

use_robin

use robin algorithm centers as starting point?

max_iter

the maximum number of iterations allowed.

max_tol

maximum tolerance parameter used for the algorithm as stopping rule.

cutoff

quantile of chi-square distribution to be used as a threshold for outliers detection, defaults to 0.999.

Value

A list with the following components:

centers

: A matrix of cluster centers.

cluster

: A vector of integer (from 1:k) indicating the cluster to which each point is allocated.

tau

: \tau scale value.

iter

: Number of iterations until convergence is achieved or maximum number of iteration reached.

di

: Distance of each observation to its assigned cluster-center

outliers

: A vector of integers with indices for each observation considered as outlier.

References

Gonzalez, J. D., Yohai, V. J., & Zamar, R. H. (2019). Robust Clustering Using Tau-Scales. arXiv preprint arXiv:1906.08198.

Examples

# Generate synthetic data (three clusters well separated)
Z <- rnorm(600)
mues <- rep(c(-3, 0, 3), 200)
X <- matrix(Z + mues, ncol = 2)

# Generate 60 synthetic outliers (contamination level 20%)
X[sample(1:300,60), ] <- matrix(runif( 40, 3 * min(X), 3 * max(X) ),
                                ncol = 2, nrow = 60)

robust <- ktaucentersfast(
     X, centers = X[sample(1:300, 3), ],
     max_tol = 1e-3, max_iter = 100)

oldpar <- par(mfrow = c(1, 2))

plot(X,type = "n", main = "ktaucenters (Robust) \n outliers: solid black dots")
points(X[robust$cluster == 1, ], col = 2)
points(X[robust$cluster == 2, ], col = 3)
points(X[robust$cluster == 3, ], col = 4)
points(X[robust$outliers, 1], X[robust$outliers, 2], pch = 19)

# Classical (non Robust) algorithm
non_robust <- kmeans(X, centers = 3, nstart = 100)

plot(X, type = "n", main = "kmeans (Classical)")
points(X[non_robust$cluster == 1, ], col = 2)
points(X[non_robust$cluster == 2, ], col = 3)
points(X[non_robust$cluster == 3, ], col = 4)

par(oldpar)

[Package ktaucenters version 1.0.0 Index]