pca {ppclust} | R Documentation |
Possibilistic Clustering Algorithm
Description
Partitions a numeric data set by using the Possibilistic Clustering Algorithm (PCA) which has been proposed by Yang & Wu (2006).
Usage
pca(x, centers, memberships, m=2, eta=2,
dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand",
nstart=1, iter.max=1000, con.val=1e-09,
fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)
Arguments
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
memberships |
a numeric matrix containing the initial membership degrees. If missing, it is internally generated. |
m |
a number greater than 1 to be used as the fuzziness exponent. The default is 2. |
eta |
a number greater than 1 to be used as the typicality exponent. The default is 2. |
dmetric |
a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See |
pw |
a number for the power of Minkowski distance calculation. The default is 2 if the |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see |
alginitu |
a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees. |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
con.val |
a number for the convergence value between the iterations. The default is 1e-09. |
fixcent |
a logical flag to fix the initial cluster centers. The default is |
fixmemb |
a logical flag to fix the initial membership degrees. The default is |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
Details
Unlike the Possibilistic C-Means (PCM) algorithm requiring the results of a previous run of Fuzzy C-Means (FCM) clustering in order to calculate the parameter \Omega
, Possibilistic Clustering Algorithm (PCA) is based on the FCM objective function, the partition coefficient (PC) and partition entropy (PE) validity indexes. So that PCA directly computes the typicality values and needs not run FCM beforehand to compute this parameter. The resulting membership becomes the exponential function, hence, it is reported that it is robust to noise and outliers (Yang & Wu, 2006). However, Wu et al (2010) reported that PCA is very sensitive to initializations and sometimes generates coincident clusters.
The objective function of PCA is:
J_{PCA}(\mathbf{X}; \mathbf{V}, \mathbf{T})=\sum\limits_{j=1}^k \sum\limits_{i=1}^n t_{ij}^m \; d^2(\vec{x}_i, \vec{v}_j) + \frac{\beta}{m^2\sqrt{k}} \sum\limits_{j=1}^k \sum\limits_{i=1}^n (t_{ij}^m \; log \; t_{ij}^m - t_{ij}^m)
Where:
t_{ij} = exp\Big(- \frac{m \sqrt{k} \; d^2(\vec{x}_i, \vec{v}_j)}{\beta}\Big) \;\;; {1\leq i\leq n},\; {1\leq j\leq k}
The update equation for cluster prototypes:
\vec{v}_{j} =\frac{\sum\limits_{i=1}^n t_{ij}^m \; \vec{x}_i}{\sum\limits_{i=1}^n t_{ij}^m} \;\;; {1\leq j\leq k}
Where:
\beta = \frac{\sum\limits_{i=1}^n \; d^2(\vec{x}_i, \overline{x})}{n}
with \overline{x}=\frac{\sum\limits_{i=1}^n \vec{x}_i}{n}
Value
an object of class ‘ppclust’, which is a list consists of the following items:
v |
a numeric matrix containing the final cluster prototypes. |
u |
a numeric matrix containing the fuzzy membership degrees of the data objects. |
t |
a numeric matrix containing the typicality degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
x |
a numeric matrix containing the processed data set. |
cluster |
a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
k |
an integer for the number of clusters. |
m |
a number for the fuziness exponent. |
eta |
a number for the typicality exponent. |
omega |
a numeric vector of reference distances. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start that produced the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘PCM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Author(s)
Zeynel Cebeci, Alper Tuna Kavlak
References
Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>
Yang, M. S. & Wu, K. L. (2006). Unsupervised possibilistic clustering. Pattern Recognition, 39(1): 5-21. <doi:10.1016/j.patcog.2005.07.005>
See Also
ekm
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pcm
,
pcmr
,
pfcm
,
upfc
Examples
# Load dataset X16
data(x16)
x <- x16[,-3]
# Initialize the prototype matrix using K-means++
v <- inaparc::kmpp(x, k=2)$v
# Initialize the membership degrees matrix
u <- inaparc::imembrand(nrow(x), k=2)$u
# Run PCA
pca.res <- pca(x, centers=v, memberships=u, m=2, eta=2)
# Display the fuzzy membership degrees
print(round(pca.res$u,2))
# Display the typicality degrees
print(round(pca.res$t,2))