redkm {drclust} | R Documentation |
k-means on a reduced subspace
Description
Performs simultaneously k-means partitioning on units and principal component analysis on the variables.
Usage
redkm(X, K, Q, Rndstart, verbose, maxiter, tol, rot, prep, print)
Arguments
X |
Units x variables numeric data matrix. |
K |
Number of clusters for the units. |
Q |
Number of principal components w.r.t. variables. |
Rndstart |
Number of runs to be performed (Defaults is 20). |
verbose |
Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option). |
maxiter |
Maximum number of iterations allowed (if convergence is not yet reached. Default is 100). |
tol |
Tolerance threshold (maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed. Default is 1e-6). |
rot |
performs varimax rotation of axes obtained via PCA. (=1 enabled; =0 disabled, default option) |
prep |
Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed. |
print |
Tolerancestats summary statistics of the performed method (1 = enabled; 0 = disabled, default option). |
Value
returns a list of estimates and some descriptive quantities of the final results.
U |
Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each unit has been assigned. |
A |
Variables x components loading matrix (orthonormal). |
centers |
K x Q matrix of centers containing the row means expressed in the reduced space of Q principal components. |
totss |
The total sum of squares (scalar). |
withinss |
Vector of within-cluster sum of squares, one component per cluster. |
betweenss |
Amount of deviance captured by the model (scalar). |
size |
Number of units assigned to each cluster (vector). |
pseudoF |
Calinski-Harabasz index of the resulting partition (scalar). |
loop |
The index of the (best) run from which the results have been chosen. |
it |
the number of iterations performed during the (best) run. |
Author(s)
Ionel Prunila, Maurizio Vichi
References
de Soete G., Carroll J. (1994) "K-means clustering in a low-dimensional Euclidean space" <doi:10.1007/978-3-642-51175-2_24>
Kaiser H.F. (1958) "The varimax criterion for analytic rotation in factor analysis" <doi:10.1007/BF02289233>
Examples
# Iris data
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5])
# reduced k-means with 3 unit-clusters and 2 components for the variables
out <- redkm(iris, K = 3, Q = 2, Rndstart = 15, verbose = 0, maxiter = 100, tol = 1e-7, rot = 1)