hcm {ppclust} | R Documentation |
Hard C-Means Clustering
Description
Partitions a numeric data set by using Hard C-Means (HCM) clustering algorithm (or K-Means) which has been proposed by MacQueen(1967). The function hcm
is an extension of the basic kmeans
with more input arguments and output values in order to make the clustering results comparable with those of other fuzzy and possibilistic algorithms. For instance, not only the Euclidean distance metric but also a number of distance metrics such as the squared Euclidean distance, the squared Chord distance etc. can be employed with the function hcm
.
Usage
hcm(x, centers, dmetric="euclidean", pw=2, alginitv="kmpp",
nstart=1, iter.max=1000, con.val=1e-9, stand=FALSE, numseed)
Arguments
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
dmetric |
a string for the distance metric. The default is euclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Details
Hard C-Means (HCM) clustering algorithm (or K-means) partitions a data set into k groups, so-called clusters. The objective function of HCM is:
J_{HCM}(\mathbf{X}; \mathbf{V})=\sum\limits_{i=1}^n d^2(\vec{x}_i, \vec{v}_j)
See ppclust-package
for the details about the terms in the above equation of J_{HCM}
.
The update equation for membership degrees is:
u_{ij} = \left\{
\begin{array}{rl}
1 & if \; d^2(\vec{x}_i, \vec{v}_j) = min_{1\leq l\leq k} \; (d^2(\vec{x}_i, \vec{v}_l)) \\
0 & otherwise
\end{array}
\right.
The update equation for cluster prototypes is:
\vec{v}_{j} =\frac{\sum\limits_{i=1}^n u_{ij} \vec{x}_i}{\sum\limits_{i=1}^n u_{ij}} \;\;; {1\leq j\leq k}
Value
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the hard membership degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
cluster |
a numeric vector containing the cluster labels of the data objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
best.start |
an integer for the index of start with the minimum objective functional. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
func.val |
a numeric vector for the objective function values of each start of the algorithm. |
comp.time |
a numeric vector for the execution time of each start of the algorithm. |
wss |
a numeric vector containing the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
stand |
a logical value, |
algorithm |
a string for the name of partitioning algorithm. It is ‘HCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Author(s)
Zeynel Cebeci & Figen Yildiz
References
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, Univ. of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>
See Also
kmeans
,
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
Examples
## Not run:
# Load dataset iris
data(iris)
x <- iris[,-5]
# Initialize the prototype matrix using K-means++
v <- inaparc::kmpp(x, k=3)$v
# Run HCM with the initial prototypes
res.hcm <- hcm(x, centers=v)
# Print, summarize and plot the clustering result
res.hcm$cluster
summary(res.hcm$cluster)
plot(x, col=res.hcm$cluster, pch=16)
## End(Not run)