R: Factorial k-means

factkm {drclust}

R Documentation

Factorial k-means

Description

Performs simultaneously k-means partitioning on units and principal component analysis on the variables. Identifies the best partition in a Least-Squares sense in the best reduced space of the data. Both the data and the centroids are used to identify the best Least-Squares reduced subspace, where also their distances is measured.

Usage

factkm(X, K, Q, Rndstart, verbose, maxiter, tol, rot, prep, print)

Arguments

`X`	Units x variables numeric data matrix.
`K`	Number of clusters for the units.
`Q`	Number of principal components w.r.t. variables.
`Rndstart`	Number of runs to be performed (Defaults is 20).
`verbose`	Outputs basic summary statistics for each run (1 = enabled; 0 = disabled, default option).
`maxiter`	Maximum number of iterations allowed (if convergence is not yet reached. Default is 100).
`tol`	Tolerance threshold (maximum difference in the values of the objective function of two consecutive iterations such that convergence is assumed. Default is 1e-6).
`rot`	performs varimax rotation of axes obtained via PCA. (=1 enabled; =0 disabled, default option)
`prep`	Pre-processing of the data. 1 performs the z-score transform (default choice); 2 performs the min-max transform; 0 leaves the data un-pre-processed.
`print`	Prints summary statistics of the results (1 = enabled; 0 = disabled, default option).

Value

returns a list of estimates and some descriptive quantities of the final results.

`U`	Units x clusters membership matrix (binary and row-stochastic). Each row is a dummy variable indicating to which cluster each unit has been assigned.
`A`	Variables x components loading matrix (orthonormal).
`centers`	K x Q matrix of centers containing the row means expressed in the reduced space of Q principal components.
`totss`	The total sum of squares.
`withinss`	Vector of within-cluster sum of squares, one component per cluster.
`betweenss`	amount of deviance captured by the model.
`size`	Number of units assigned to each cluster.
`pseudoF`	Calinski-Harabasz index of the resulting partition.
`loop`	The index of the (best) run from which the results have been chosen.
`it`	the number of iterations performed during the (best) run.

Author(s)

Ionel Prunila, Maurizio Vichi

References

Vichi M., Kiers H.A.L. (2001) "Factorial k-means analysis for two-way data" <doi:10.1016/S0167-9473(00)00064-5>

Kaiser H.F. (1958) "The varimax criterion for analytic rotation in factor analysis" <doi:10.1007/BF02289233>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

# factorial k-means with 3 unit-clusters and 2 components for the variables
out <- factkm(iris, K = 3, Q = 2, Rndstart = 15, verbose = 0, maxiter = 100, tol = 1e-7, rot = 1)

[Package drclust version 0.1 Index]