doublekm {drclust} | R Documentation |
Double k-means Clustering
Description
Performs simultaneous k-means partitioning on units and variables (rows and columns of the data matrix).
Usage
doublekm(Xs, K, Q, Rndstart, verbose, maxiter, tol, prep, print)
Arguments
Xs |
Units x variables numeric data matrix. |
K |
Number of clusters for the units. |
Q |
Number of clusters for the variables. |
Rndstart |
Number of runs to be performed (Defaults is 20). |
verbose |
Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option). |
maxiter |
Maximum number of iterations allowed (if convergence is not yet reached. Default is 100). |
tol |
Tolerance threshold. It is the maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed (default is 1e-6). |
prep |
Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed. |
print |
Prints summary statistics of the results (1 = enabled; 0 = disabled, default option). |
Value
returns a list of estimates and some descriptive quantities of the final results.
U |
Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which unit-cluster each unit has been assigned. |
V |
Variables x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which variable-cluster each variable has been assigned. |
centers |
K x Q matrix of centers containing the row means expressed in terms of column means. |
totss |
The total sum of squares (scalar). |
withinss |
Vector of within-row-cluster sum of squares, one component per cluster. |
columnwise_withinss |
Vector of within-column-cluster sum of squares, one component per cluster. |
betweenss |
Amount of deviance captured by the model (scalar). |
K-size |
Number of units assigned to each row-cluster (vector). |
Q-size |
Number of variables assigned to each column-cluster (vector). |
pseudoF |
Calinski-Harabasz index of the resulting (row-) partition (scalar). |
loop |
The index of the (best) run from which the results have been chosen. |
it |
the number of iterations performed during the (best) run. |
Author(s)
Ionel Prunila, Maurizio Vichi
References
Vichi M. (2001) "Double k-means Clustering for Simultaneous Classification of Objects and Variables" <doi:10.1007/978-3-642-59471-7_6>
Examples
# Iris data
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5])
# double k-means with 3 unit-clusters and 2 variable-clusters
out <- doublekm(iris, K = 3, Q = 2)