R: Fuzzy Possibilistic Product Partition C-Means Clustering

fpppcm {ppclust}

R Documentation

Fuzzy Possibilistic Product Partition C-Means Clustering

Description

Partitions a numeric data set by using the Fuzzy Possibilistic Product Partition C-Means (FPPPCM) clustering algorithm which has been proposed by Szilagyi & Szilagyi (2014).

Usage

fpppcm(x, centers, memberships, m=2, eta=2, K=1, omega, 
    dmetric="sqeuclidean", pw=2, alginitv="kmpp", alginitu="imembrand", 
    nstart=1, iter.max=1000, con.val=1e-09, 
    fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)

Arguments

`x`	a numeric vector, data frame or matrix.
`centers`	an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers.
`memberships`	a numeric matrix containing the initial membership degrees. If missing, it is internally generated.
`m`	a number greater than 1 to be used as the fuzziness exponent. The default is 2.
`eta`	a number greater than 1 to be used as the typicality exponent. The default is 2.
`K`	a number greater than 0 to be used as the weight of penalty term. The default is 1.
`omega`	a numeric vector of reference distances. If missing, it is internally generated.
`dmetric`	a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See `get.dmetrics` for the alternative options.
`pw`	a number for the power of Minkowski distance calculation. The default is 2 if the `dmetric` is minkowski.
`alginitv`	a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see `get.algorithms`.
`alginitu`	a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees.
`nstart`	an integer for the number of starts for clustering. The default is 1.
`iter.max`	an integer for the maximum number of iterations allowed. The default is 1000.
`con.val`	a number for the convergence value between the iterations. The default is 1e-09.
`fixcent`	a logical flag to fix the initial cluster centers. The default is `FALSE`. If it is `TRUE`, the initial centers are not changed in the successive starts of the algorithm when the `nstart` is greater than 1.
`fixmemb`	a logical flag to fix the initial membership degrees. The default is `FALSE`. If it is `TRUE`, the initial memberships are not changed in the successive starts of the algorithm when the `nstart` is greater than 1.
`stand`	a logical flag to standardize data. Its default value is `FALSE`. If its value is `TRUE`, the data matrix `x` is standardized.
`numseed`	a seeding number to set the seed of R's random number generator.

Details

Fuzzy Possibilistic Product Partition C-Means (FPPPCM) clustering algorithm aimed to eliminate the effect of outliers in the other fuzzy and possibilistic clustering algorithms. The algorithm includes a probabilistic and a possibilistic term via multiplicative way instead of additive combination (Gosztolya & Szilagyi, 2015). The objective function of the algorithm as follows:

J_{FPPPCM}(\mathbf{X}; \mathbf{V}, \mathbf{U}, \mathbf{T})=\sum\limits_{j=1}^k \sum\limits_{i=1}^n u_{ij}^m \big[ t_{ij}^\eta \; d^2(\vec{x}_i, \vec{v}_j) + \Omega_j (1-t_{ij})^\eta \big]

The fuzzy membership degrees in the probabilistic part of the objective function J_{FPPPCM} is updated as follows:

u_{ij} = \frac{\Big[t_{ij}^\eta \; d^2(\vec{x}_i, \vec{v}_j) \; + \; \Omega_j (1-t_{ij})^\eta \Big]^{-1/(m-1)}}{\Big[ \sum\limits_{l=1}^k t_{il}^\eta \; d^2(\vec{x}_i, \vec{v}_l) \; + \; \Omega_l (1-t_{il})^\eta \Big]^{-1/(m-1)}} \;;\; 1 \leq i \leq n, \; 1 \leq j \leq k

The typicality degrees in the possibilistic part of the objective function J_{FPPPCM} is calculated as follows:

t_{ij} =\Bigg[1 + \Big(\frac{d^2(\vec{x}_i, \vec{v}_j)}{\Omega_j}\Big)^{1/(\eta -1)}\Bigg]^{-1} \;;\; 1 \leq i \leq n, \; 1 \leq j \leq k

m is the fuzzifier to specify the amount of fuzziness for the clustering; 1\leq m\leq \infty. It is usually chosen as 2.

\eta is the typicality exponent to specify the amount of typicality for the clustering; 1\leq \eta\leq \infty. It is usually chosen as 2.

\Omega is the possibilistic penalty term to control the variance of the clusters.

The update equation for cluster prototypes:

\vec{v}_j =\frac{\sum\limits_{i=1}^n u_{ij}^m \; t_{ij}^\eta \; \vec{x}_i}{\sum\limits_{i=1}^n u_{ij}^m \; t_{ij}^\eta} \;;\; 1 \leq j \leq k

Value

an object of class ‘ppclust’, which is a list consists of the following items:

`v`	a numeric matrix containing the final cluster prototypes.
`t`	a numeric matrix containing the typicality degrees of the data objects.
`d`	a numeric matrix containing the distances of objects to the final cluster prototypes.
`x`	a numeric matrix containing the processed data set.
`cluster`	a numeric vector containing the cluster labels found by defuzzifying the typicality degrees of the objects.
`csize`	a numeric vector for the number of objects in the clusters.
`k`	an integer for the number of clusters.
`m`	a number for the used fuzziness exponent.
`eta`	a number for the used typicality exponent.
`omega`	a numeric vector of reference distances.
`iter`	an integer vector for the number of iterations in each start of the algorithm.
`best.start`	an integer for the index of start that produced the minimum objective functional.
`func.val`	a numeric vector for the objective function values in each start of the algorithm.
`comp.time`	a numeric vector for the execution time in each start of the algorithm.
`stand`	a logical value, `TRUE` shows that `x` data set contains the standardized values of raw data.
`wss`	a number for the within-cluster sum of squares for each cluster.
`bwss`	a number for the between-cluster sum of squares.
`tss`	a number for the total within-cluster sum of squares.
`twss`	a number for the total sum of squares.
`algorithm`	a string for the name of partitioning algorithm. It is ‘PCM’ with this function.
`call`	a string for the matched function call generating this ‘ppclust’ object.

Author(s)

Zeynel Cebeci, Alper Tuna Kavlak & Figen Yildiz

References

Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>

Szilagyi, L. & Szilagyi, S. M. (2014). Generalization rules for the suppressed fuzzy c-means clustering algorithm. Neurocomputing, 139:298-309. <doi:10.1016/j.neucom.2014.02.027>

Gosztolya, G. & Szilagyi, L. (2015). Application of fuzzy and possibilistic c-means clustering models in blind speaker clustering. Acta Polytechnica Hungarica, 12(7):41-56. <http://publicatio.bibl.u-szeged.hu/6151/1/2015-acta-polytechnica.pdf>

Examples

# Load dataset X16
data(x16)
x <- x16[,-3]
# Initialize the prototype matrix using K-means++
v <- inaparc::kmpp(x, k=2)$v
# Initialize the memberships degrees matrix 
u <- inaparc::imembrand(nrow(x), k=2)$u

# Run FPPPCM 
res.fpppcm <- fpppcm(x, centers=v, memberships=u, m=2, eta=2)

# Display typicality degrees 
res.fpppcm$t

# Run FPPPCM for eta=3
res.fpppcm <- fpppcm(x, centers=v, memberships=u, m=2, eta=3)

# Display typicality degrees 
res.fpppcm$t

[Package ppclust version 1.1.0.1 Index]