cluspcamix {clustrd} | R Documentation |
Joint dimension reduction and clustering of mixed-type data.
Description
This function implements clustering and dimension reduction for mixed-type variables, i.e., categorical and metric (see, Yamamoto & Hwang, 2014; van de Velden, Iodice D'Enza, & Markos 2019; Vichi, Vicari, & Kiers, 2019). This framework includes Mixed Reduced K-means and Mixed Factorial K-means, as well as a compromise of these two methods. The methods combine Principal Component Analysis of mixed-data for dimension reduction with K-means for clustering.
Usage
cluspcamix(data, nclus, ndim, method=c("mixedRKM", "mixedFKM"),
center = TRUE, scale = TRUE, alpha=NULL, rotation="none",
nstart = 100, smartStart=NULL, seed=NULL, inboot = FALSE)
## S3 method for class 'cluspcamix'
print(x, ...)
## S3 method for class 'cluspcamix'
summary(object, ...)
## S3 method for class 'cluspcamix'
fitted(object, mth = c("centers", "classes"), ...)
Arguments
data |
Dataset with categorical and metric variables |
nclus |
Number of clusters (nclus = 1 returns the PCAMIX solution) |
ndim |
Dimensionality of the solution |
method |
Specifies the method. Options are mixedRKM for mixed reduced K-means and mixedFKM for mixed factorial K-means (default = |
center |
A logical value indicating whether the variables should be shifted to be zero centered (default = |
scale |
A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = |
alpha |
Adjusts for the relative importance of Mixed RKM and Mixed FKM in the objective function; |
rotation |
Specifies the method used to rotate the factors. Options are |
nstart |
Number of random starts (default = 100) |
smartStart |
If |
seed |
An integer that is used as argument by |
inboot |
Used internally in the bootstrap functions to perform bootstrapping on the indicator matrix. |
x |
For the |
object |
For the |
mth |
For the |
... |
Not used |
Details
For the K-means part, the algorithm of Hartigan-Wong is used by default.
The hidden print
and summary
methods print out some key components of an object of class cluspcamix
.
The hidden fitted
method returns cluster fitted values. If method is "classes"
, this is a vector of cluster membership (the cluster component of the "cluspcamix" object). If method is "centers"
, this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.
When nclus
= 1 the function returns the solution of PCAMIX and plot(object)
shows the corresponding biplot.
Value
obscoord |
Object scores |
attcoord |
Variable scores |
centroid |
Cluster centroids |
cluster |
Cluster membership |
criterion |
Optimal value of the objective criterion |
size |
The number of objects in each cluster |
scale |
A copy of |
center |
A copy of |
nstart |
A copy of |
odata |
A copy of |
References
van de Velden, M., Iodice D'Enza, A., & Markos, A. (2019). Distance-based clustering of mixed data. Wiley Interdisciplinary Reviews: Computational Statistics, e1456.
Vichi, M., Vicari, D., & Kiers, H.A.L. (2019). Clustering and dimension reduction for mixed variables. Behaviormetrika. doi:10.1007/s41237-018-0068-6.
Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115-129.
See Also
Examples
data(diamond)
#Mixed Reduced K-means solution with 3 clusters in 2 dimensions
#after 10 random starts
outmixedRKM = cluspcamix(diamond, 3, 2, method = "mixedRKM", nstart = 10, seed = 1234)
outmixedRKM
#A graph with the categories and a biplot of the continuous variables (dimensions 1 and 2)
plot(outmixedRKM)
#Tandem analysis: PCAMIX or FAMD followed by K-means solution
#with 3 clusters in 2 dimensions after 10 random starts
outTandem = cluspcamix(diamond, 3, 2, alpha = 1, nstart = 10, seed = 1234)
outTandem
#Scatterplot (dimensions 1 and 2)
plot(outTandem)
#nclus = 1 just gives the PCAMIX or FAMD solution
#outPCAMIX = cluspcamix(diamond, 1, 2)
#outPCAMIX
#Biplot (dimensions 1 and 2)
#plot(outPCAMIX)