tkmeans {tclust} | R Documentation |
TKMEANS method for robust K-means clustering
Description
This function searches for k
(or less) spherical clusters
in a data matrix x
, whereas the ceiling(alpha n)
most outlying
observations are trimmed.
Usage
tkmeans(
x,
k,
alpha = 0.05,
nstart = 500,
niter1 = 3,
niter2 = 20,
nkeep = 5,
iter.max,
points = NULL,
center = FALSE,
scale = FALSE,
store_x = TRUE,
parallel = FALSE,
n.cores = -1,
zero_tol = 1e-16,
drop.empty.clust = TRUE,
trace = 0
)
Arguments
x |
A matrix or data.frame of dimension n x p, containing the observations (row-wise). |
k |
The number of clusters initially searched for. |
alpha |
The proportion of observations to be trimmed. |
nstart |
The number of random initializations to be performed. |
niter1 |
The number of concentration steps to be performed for the nstart initializations. |
niter2 |
The maximum number of concentration steps to be performed for the
|
nkeep |
The number of iterated initializations (after niter1 concentration steps) with the best values in the target function that are kept for further iterations |
iter.max |
(deprecated, use the combination |
points |
Optional initial mean vectors, |
center |
Optional centering of the data: a function or a vector of length p which can optionally be specified for centering x before calculation |
scale |
Optional scaling of the data: a function or a vector of length p which can optionally be specified for scaling x before calculation |
store_x |
A logical value, specifying whether the data matrix |
parallel |
A logical value, specifying whether the nstart initializations should be done in parallel. |
n.cores |
The number of cores to use when paralellizing, only taken into account if parallel=TRUE. |
zero_tol |
The zero tolerance used. By default set to 1e-16. |
drop.empty.clust |
Logical value specifying, whether empty clusters shall be omitted in the resulting object. (The result structure does not contain center estimates of empty clusters anymore. Cluster names are reassigned such that the first l clusters (l <= k) always have at least one observation. |
trace |
Defines the tracing level, which is set to 0 by default. Tracing level 1 gives additional information on the stage of the iterative process. |
Value
The function returns the following values:
cluster - A numerical vector of size
n
containing the cluster assignment for each observation. Cluster names are integer numbers from 1 to k, 0 indicates trimmed observations. Note that it could be empty clusters with no observations whenequal.weights=FALSE
.obj - The value of the objective function of the best (returned) solution.
size - An integer vector of size k, returning the number of observations contained by each cluster.
centers - A matrix of size p x k containing the centers (column-wise) of each cluster.
code - A numerical value indicating if the concentration steps have converged for the returned solution (2).
cluster.ini - A matrix with nstart rows and number of columns equal to the number of observations and where each row shows the final clustering assignments (0 for trimmed observations) obtained after the
niter1
iteration of thenstart
random initializations.obj.ini - A numerical vector of length
nstart
containing the values of the target function obtained after theniter1
iteration of thenstart
random initializations.x - The input data set.
k - The input number of clusters.
alpha - The input trimming level.
Author(s)
Valentin Todorov, Luis Angel Garcia Escudero, Agustin Mayo Iscar.
References
Cuesta-Albertos, J. A.; Gordaliza, A. and MatrĂ¡n, C. (1997), "Trimmed k-means: an attempt to robustify quantizers". Annals of Statistics, Vol. 25 (2), 553-576.
Examples
##--- EXAMPLE 1 ------------------------------------------
sig <- diag(2)
cen <- rep(1,2)
x <- rbind(MASS::mvrnorm(360, cen * 0, sig),
MASS::mvrnorm(540, cen * 5, sig),
MASS::mvrnorm(100, cen * 2.5, sig))
## Two groups and 10\% trimming level
(clus <- tkmeans(x, k = 2, alpha = 0.1))
plot(clus)
plot(clus, labels = "observation")
plot(clus, labels = "cluster")
#--- EXAMPLE 2 ------------------------------------------
data(geyser2)
(clus <- tkmeans(geyser2, k = 3, alpha = 0.03))
plot(clus)