redkm {drclust}R Documentation

k-means on a reduced subspace

Description

Performs simultaneously k-means partitioning on units and principal component analysis on the variables.

Usage

redkm(X, K, Q, Rndstart, verbose, maxiter, tol, rot, prep, print)

Arguments

X

Units x variables numeric data matrix.

K

Number of clusters for the units.

Q

Number of principal components w.r.t. variables.

Rndstart

Number of runs to be performed (Defaults is 20).

verbose

Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option).

maxiter

Maximum number of iterations allowed (if convergence is not yet reached. Default is 100).

tol

Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed. Default is 1e-6).

rot

performs varimax rotation of axes obtained via PCA. (=1 enabled; =0 disabled, default option)

prep

Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed.

print

Tolerancestats summary statistics of the performed method (1 = enabled; 0 = disabled, default option).

Value

returns a list of estimates and some descriptive quantities of the final results.

U

Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each unit has been assigned.

A

Variables x components loading matrix (orthonormal).

centers

K x Q matrix of centers containing the row means expressed in the reduced space of Q principal components.

totss

The total sum of squares (scalar).

withinss

Vector of within-cluster sum of squares, one component per cluster.

betweenss

Amount of deviance captured by the model (scalar).

size

Number of units assigned to each cluster (vector).

pseudoF

Calinski-Harabasz index of the resulting partition (scalar).

loop

The index of the (best) run from which the results have been chosen.

it

the number of iterations performed during the (best) run.

Author(s)

Ionel Prunila, Maurizio Vichi

References

de Soete G., Carroll J. (1994) "K-means clustering in a low-dimensional Euclidean space" <doi:10.1007/978-3-642-51175-2_24>

Kaiser H.F. (1958) "The varimax criterion for analytic rotation in factor analysis" <doi:10.1007/BF02289233>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

# reduced k-means with 3 unit-clusters and 2 components for the variables
out <- redkm(iris, K = 3, Q = 2, Rndstart = 15, verbose = 0, maxiter = 100, tol = 1e-7, rot = 1)


[Package drclust version 0.1 Index]