| cluspcamix {clustrd} | R Documentation |
Joint dimension reduction and clustering of mixed-type data.
Description
This function implements clustering and dimension reduction for mixed-type variables, i.e., categorical and metric (see, Yamamoto & Hwang, 2014; van de Velden, Iodice D'Enza, & Markos 2019; Vichi, Vicari, & Kiers, 2019). This framework includes Mixed Reduced K-means and Mixed Factorial K-means, as well as a compromise of these two methods. The methods combine Principal Component Analysis of mixed-data for dimension reduction with K-means for clustering.
Usage
cluspcamix(data, nclus, ndim, method=c("mixedRKM", "mixedFKM"),
center = TRUE, scale = TRUE, alpha=NULL, rotation="none",
nstart = 100, smartStart=NULL, seed=NULL, inboot = FALSE)
## S3 method for class 'cluspcamix'
print(x, ...)
## S3 method for class 'cluspcamix'
summary(object, ...)
## S3 method for class 'cluspcamix'
fitted(object, mth = c("centers", "classes"), ...)
Arguments
data |
Dataset with categorical and metric variables |
nclus |
Number of clusters (nclus = 1 returns the PCAMIX solution) |
ndim |
Dimensionality of the solution |
method |
Specifies the method. Options are mixedRKM for mixed reduced K-means and mixedFKM for mixed factorial K-means (default = |
center |
A logical value indicating whether the variables should be shifted to be zero centered (default = |
scale |
A logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place (default = |
alpha |
Adjusts for the relative importance of Mixed RKM and Mixed FKM in the objective function; |
rotation |
Specifies the method used to rotate the factors. Options are |
nstart |
Number of random starts (default = 100) |
smartStart |
If |
seed |
An integer that is used as argument by |
inboot |
Used internally in the bootstrap functions to perform bootstrapping on the indicator matrix. |
x |
For the |
object |
For the |
mth |
For the |
... |
Not used |
Details
For the K-means part, the algorithm of Hartigan-Wong is used by default.
The hidden print and summary methods print out some key components of an object of class cluspcamix.
The hidden fitted method returns cluster fitted values. If method is "classes", this is a vector of cluster membership (the cluster component of the "cluspcamix" object). If method is "centers", this is a matrix where each row is the cluster center for the observation. The rownames of the matrix are the cluster membership values.
When nclus = 1 the function returns the solution of PCAMIX and plot(object) shows the corresponding biplot.
Value
obscoord |
Object scores |
attcoord |
Variable scores |
centroid |
Cluster centroids |
cluster |
Cluster membership |
criterion |
Optimal value of the objective criterion |
size |
The number of objects in each cluster |
scale |
A copy of |
center |
A copy of |
nstart |
A copy of |
odata |
A copy of |
References
van de Velden, M., Iodice D'Enza, A., & Markos, A. (2019). Distance-based clustering of mixed data. Wiley Interdisciplinary Reviews: Computational Statistics, e1456.
Vichi, M., Vicari, D., & Kiers, H.A.L. (2019). Clustering and dimension reduction for mixed variables. Behaviormetrika. doi:10.1007/s41237-018-0068-6.
Yamamoto, M., & Hwang, H. (2014). A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika, 41, 115-129.
See Also
Examples
data(diamond)
#Mixed Reduced K-means solution with 3 clusters in 2 dimensions
#after 10 random starts
outmixedRKM = cluspcamix(diamond, 3, 2, method = "mixedRKM", nstart = 10, seed = 1234)
outmixedRKM
#A graph with the categories and a biplot of the continuous variables (dimensions 1 and 2)
plot(outmixedRKM)
#Tandem analysis: PCAMIX or FAMD followed by K-means solution
#with 3 clusters in 2 dimensions after 10 random starts
outTandem = cluspcamix(diamond, 3, 2, alpha = 1, nstart = 10, seed = 1234)
outTandem
#Scatterplot (dimensions 1 and 2)
plot(outTandem)
#nclus = 1 just gives the PCAMIX or FAMD solution
#outPCAMIX = cluspcamix(diamond, 1, 2)
#outPCAMIX
#Biplot (dimensions 1 and 2)
#plot(outPCAMIX)