R: k-means on a reduced subspace

redkm {drclust}

R Documentation

k-means on a reduced subspace

Description

Performs simultaneously k-means partitioning on units and principal component analysis on the variables.

Usage

redkm(X, K, Q, Rndstart, verbose, maxiter, tol, rot, prep, print)

Arguments

`X`	Units x variables numeric data matrix.
`K`	Number of clusters for the units.
`Q`	Number of principal components w.r.t. variables.
`Rndstart`	Number of runs to be performed (Defaults is 20).
`verbose`	Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option).
`maxiter`	Maximum number of iterations allowed (if convergence is not yet reached. Default is 100).
`tol`	Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed. Default is 1e-6).
`rot`	performs varimax rotation of axes obtained via PCA. (=1 enabled; =0 disabled, default option)
`prep`	Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed.
`print`	Tolerancestats summary statistics of the performed method (1 = enabled; 0 = disabled, default option).

Value

returns a list of estimates and some descriptive quantities of the final results.

`U`	Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each unit has been assigned.
`A`	Variables x components loading matrix (orthonormal).
`centers`	K x Q matrix of centers containing the row means expressed in the reduced space of Q principal components.
`totss`	The total sum of squares (scalar).
`withinss`	Vector of within-cluster sum of squares, one component per cluster.
`betweenss`	Amount of deviance captured by the model (scalar).
`size`	Number of units assigned to each cluster (vector).
`pseudoF`	Calinski-Harabasz index of the resulting partition (scalar).
`loop`	The index of the (best) run from which the results have been chosen.
`it`	the number of iterations performed during the (best) run.

Author(s)

Ionel Prunila, Maurizio Vichi

References

de Soete G., Carroll J. (1994) "K-means clustering in a low-dimensional Euclidean space" <doi:10.1007/978-3-642-51175-2_24>

Kaiser H.F. (1958) "The varimax criterion for analytic rotation in factor analysis" <doi:10.1007/BF02289233>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

# reduced k-means with 3 unit-clusters and 2 components for the variables
out <- redkm(iris, K = 3, Q = 2, Rndstart = 15, verbose = 0, maxiter = 100, tol = 1e-7, rot = 1)

[Package drclust version 0.1 Index]