R: Double k-means Clustering

doublekm {drclust}

R Documentation

Double k-means Clustering

Description

Performs simultaneous k-means partitioning on units and variables (rows and columns of the data matrix).

Usage

doublekm(Xs, K, Q, Rndstart, verbose, maxiter, tol, prep, print)

Arguments

`Xs`	Units x variables numeric data matrix.
`K`	Number of clusters for the units.
`Q`	Number of clusters for the variables.
`Rndstart`	Number of runs to be performed (Defaults is 20).
`verbose`	Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option).
`maxiter`	Maximum number of iterations allowed (if convergence is not yet reached. Default is 100).
`tol`	Tolerance threshold. It is the maximum difference between the values of the objective function of two consecutive iterations such that convergence is assumed (default is 1e-6).
`prep`	Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed.
`print`	Prints summary statistics of the results (1 = enabled; 0 = disabled, default option).

Value

returns a list of estimates and some descriptive quantities of the final results.

`U`	Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which unit-cluster each unit has been assigned.
`V`	Variables x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which variable-cluster each variable has been assigned.
`centers`	K x Q matrix of centers containing the row means expressed in terms of column means.
`totss`	The total sum of squares (scalar).
`withinss`	Vector of within-row-cluster sum of squares, one component per cluster.
`columnwise_withinss`	Vector of within-column-cluster sum of squares, one component per cluster.
`betweenss`	Amount of deviance captured by the model (scalar).
`K-size`	Number of units assigned to each row-cluster (vector).
`Q-size`	Number of variables assigned to each column-cluster (vector).
`pseudoF`	Calinski-Harabasz index of the resulting (row-) partition (scalar).
`loop`	The index of the (best) run from which the results have been chosen.
`it`	the number of iterations performed during the (best) run.

Author(s)

Ionel Prunila, Maurizio Vichi

References

Vichi M. (2001) "Double k-means Clustering for Simultaneous Classification of Objects and Variables" <doi:10.1007/978-3-642-59471-7_6>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

# double k-means with 3 unit-clusters and 2 variable-clusters
out <- doublekm(iris, K = 3, Q = 2)

[Package drclust version 0.1 Index]