factkm {drclust} | R Documentation |
Factorial k-means
Description
Performs simultaneously k-means partitioning on units and principal component analysis on the variables. Identifies the best partition in a Least-Squares sense in the best reduced space of the data. Both the data and the centroids are used to identify the best Least-Squares reduced subspace, where also their distances is measured.
Usage
factkm(X, K, Q, Rndstart, verbose, maxiter, tol, rot, prep, print)
Arguments
X |
Units x variables numeric data matrix. |
K |
Number of clusters for the units. |
Q |
Number of principal components w.r.t. variables. |
Rndstart |
Number of runs to be performed (Defaults is 20). |
verbose |
Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option). |
maxiter |
Maximum number of iterations allowed (if convergence is not yet reached. Default is 100). |
tol |
Tolerance threshold (maximum difference in the values of the objective function of two consecutive iterations such that convergence is assumed. Default is 1e-6). |
rot |
performs varimax rotation of axes obtained via PCA. (=1 enabled; =0 disabled, default option) |
prep |
Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed. |
print |
Prints summary statistics of the results (1 = enabled; 0 = disabled, default option). |
Value
returns a list of estimates and some descriptive quantities of the final results.
U |
Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each unit has been assigned. |
A |
Variables x components loading matrix (orthonormal). |
centers |
K x Q matrix of centers containing the row means expressed in the reduced space of Q principal components. |
totss |
The total sum of squares. |
withinss |
Vector of within-cluster sum of squares, one component per cluster. |
betweenss |
amount of deviance captured by the model. |
size |
Number of units assigned to each cluster. |
pseudoF |
Calinski-Harabasz index of the resulting partition. |
loop |
The index of the (best) run from which the results have been chosen. |
it |
the number of iterations performed during the (best) run. |
Author(s)
Ionel Prunila, Maurizio Vichi
References
Vichi M., Kiers H.A.L. (2001) "Factorial k-means analysis for two-way data" <doi:10.1016/S0167-9473(00)00064-5>
Kaiser H.F. (1958) "The varimax criterion for analytic rotation in factor analysis" <doi:10.1007/BF02289233>
Examples
# Iris data
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5])
# factorial k-means with 3 unit-clusters and 2 components for the variables
out <- factkm(iris, K = 3, Q = 2, Rndstart = 15, verbose = 0, maxiter = 100, tol = 1e-7, rot = 1)