cluspca {clustrd} | R Documentation |
Joint dimension reduction and clustering of continuous data.
Description
This function implements Factorial K-means (Vichi and Kiers, 2001) and Reduced K-means (De Soete and Carroll, 1994), as well as a compromise version of these two methods. The methods combine Principal Component Analysis for dimension reduction with K-means for clustering.
Usage
cluspca(data, nclus, ndim, alpha = NULL, method = c("RKM","FKM"),
center = TRUE, scale = TRUE, rotation = "none", nstart = 100,
smartStart = NULL, seed = NULL)
## S3 method for class 'cluspca'
print(x, ...)
## S3 method for class 'cluspca'
summary(object, ...)
## S3 method for class 'cluspca'
fitted(object, mth = c("centers", "classes"), ...)
Arguments
data |
Dataset with metric variables |
nclus |
Number of clusters (nclus = 1 returns the PCA solution |
ndim |
Dimensionality of the solution |
method |
Specifies the method. Options are RKM for reduced K-means and FKM for factorial K-means (default = |
alpha |
Adjusts for the relative importance of RKM and FKM in the objective function; |
center |
A logical value indicating whether the variables should be shifted to be zero centered (default = |
scale |
A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = |
rotation |
Specifies the method used to rotate the factors. Options are |
nstart |
Number of starts (default = 100) |
smartStart |
If |
seed |
An integer that is used as argument by |
x |
For the |
object |
For the |
mth |
For the |
... |
Not used |
Details
For the K-means part, the algorithm of Hartigan-Wong is used by default.
The hidden print
and summary
methods print out some key components of an object of class cluspca
.
The hidden fitted
method returns cluster fitted values. If method is "classes"
, this is a vector of cluster membership (the cluster component of the "cluspca" object). If method is "centers"
, this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.
When nclus
= 1 the function returns the PCA solution and plot(object)
shows the corresponding biplot.
Value
obscoord |
Object scores |
attcoord |
Variable scores |
centroid |
Cluster centroids |
cluster |
Cluster membership |
criterion |
Optimal value of the objective function |
size |
The number of objects in each cluster |
scale |
A copy of |
center |
A copy of |
nstart |
A copy of |
odata |
A copy of |
References
De Soete, G., and Carroll, J. D. (1994). K-means clustering in a low-dimensional Euclidean space. In Diday E. et al. (Eds.), New Approaches in Classification and Data Analysis, Heidelberg: Springer, 212-219.
Vichi, M., and Kiers, H.A.L. (2001). Factorial K-means analysis for two-way data. Computational Statistics and Data Analysis, 37, 49-64.
See Also
Examples
#Reduced K-means with 3 clusters in 2 dimensions after 10 random starts
data(macro)
outRKM = cluspca(macro, 3, 2, method = "RKM", rotation = "varimax", scale = FALSE, nstart = 10)
summary(outRKM)
#Scatterplot (dimensions 1 and 2) and cluster description plot
plot(outRKM, cludesc = TRUE)
#Factorial K-means with 3 clusters in 2 dimensions
#with a Reduced K-means starting solution
data(macro)
outFKM = cluspca(macro, 3, 2, method = "FKM", rotation = "varimax",
scale = FALSE, smartStart = outRKM$cluster)
outFKM
#Scatterplot (dimensions 1 and 2) and cluster description plot
plot(outFKM, cludesc = TRUE)
#To get the Tandem approach (PCA(SVD) + K-means)
outTandem = cluspca(macro, 3, 2, alpha = 1, seed = 1234)
plot(outTandem)
#nclus = 1 just gives the PCA solution
#outPCA = cluspca(macro, 1, 2)
#outPCA
#Scatterplot (dimensions 1 and 2)
#plot(outPCA)