gkpfcm {ppclust} | R Documentation |
Gustafson-Kessel Clustering Using PFCM
Description
Partitions a numeric data set by using the Gustafson-Kessel (GK) algorithm within the PFCM (Possibilistic Fuzzy C-Means) clustering algorithm (Ojeda-Magaina et al, 2006).
Usage
gkpfcm(x, centers, memberships, m=2, eta=2, K=1, omega, a, b,
dmetric="sqeuclidean", pw = 2, alginitv="kmpp",
alginitu="imembrand", nstart=1, iter.max=1000, con.val=1e-09,
fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
Arguments
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
a |
a number for the relative importance of the fuzzy part of the objective function. The default is 1. |
b |
a number for the relative importance of the possibilistic part of the objective function. The default is 1. |
K |
a number greater than 0 to be used as the weight of penalty term. The default is 1. |
omega |
a numeric vector of reference distances. If missing, it is internally generated. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is |
fixmemb |
a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Details
Gustafson-Kessel clustering within Possibilistic Fuzzy C-Means (GKPFCM) algorithm is an improvement for PFCM algorithm that consists of the modification of the distance metric for . The original PFCM uses the Euclidean distance whereas GKPFCM uses the Mahalanobis distance with GK algorithm. Babuska et al (2002) have proposed an improvement for calculating the covariance matrix
as follows:
In the above equation, involves a weighted identity matrix. The eigenvalues
and the eigenvectors
of the resulting matrix are calculated, and the maximum eigenvalue
is determined. With the obtained results,
, which satisfies
is calculated. Finally,
is recomputed with the following equation:
The objective function of GKPFCM is:
is the fuzzifier to specify the amount of fuzziness for the clustering;
. It is usually chosen as 2.
is the typicality exponent to specify the amount of typicality for the clustering;
. It is usually chosen as 2.
The objective function is minimized by using the following update equations:
Value
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the fuzzy memberships degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
m |
a number for the fuzzifier. |
eta |
a number greater than 1 to be used as the typicality exponent. |
a |
a number for the relative importance of the fuzzy part of the objective function. |
b |
a number for the relative importance of the possibilistic part of the objective function. |
omega |
a numeric vector of reference distances. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘FCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Author(s)
Zeynel Cebeci & Cagatay Cebeci
References
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Gustafson, D. E. & Kessel, W. C. (1979). Fuzzy clustering with a fuzzy covariance matrix. In Proc. of IEEE Conf. on Decision and Control including the 17th Symposium on Adaptive Processes, San Diego. pp. 761-766. <doi:10.1109/CDC.1978.268028>
Babuska, R., van der Veen, P. J. & Kaymak, U. (2002). Improved covariance estimation for Gustafson-Kessel clustering. In Proc. of Int. Conf. on Fuzzy Systems, Hawaii, 2002, pp. 1081-1085. <https://tr.scribd.com/document/209211977/Fuzzy-and-Neural-Control>.
Ojeda-Magaina, B., Ruelas, R., Corona-Nakamura, M. A. & Andina, D. (2006). An improvement to the possibilistic fuzzy c-means clustering algorithm. In Proc. of IEEE World Automation Congress, 2006 (WAC'06). pp. 1-8. <doi:10.1109/WAC.2006.376056>
See Also
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
Examples
## Not run:
# Load dataset iris
data(iris)
x <- iris[,-5]
# Initialize the prototype matrix using K-means++
v <- inaparc::kmpp(x, k=3)$v
# Initialize the memberships degrees matrix
u <- inaparc::imembrand(nrow(x), k=3)$u
# Run FCM with the initial prototypes and memberships
gkpfcm.res <- gkpfcm(x, centers=v, memberships=u, m=2)
# Show the fuzzy membership degrees for the top 5 objects
head(gkpfcm.res$u, 5)
## End(Not run)