R: Fuzzy C-Means Clustering

fcm {ppclust}

R Documentation

Fuzzy C-Means Clustering

Description

Partitions a numeric data set by using the Fuzzy C-Means (FCM) clustering algorithm (Bezdek, 1974;1981).

Usage

fcm(x, centers, memberships, m=2, dmetric="sqeuclidean", pw = 2, 
    alginitv="kmpp", alginitu="imembrand", 
    nstart=1, iter.max=1000, con.val=1e-09, 
    fixcent=FALSE, fixmemb=FALSE, stand=FALSE, numseed)

Arguments

`x`	a numeric vector, data frame or matrix.
`centers`	an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers.
`memberships`	a numeric matrix containing the initial membership degrees. If missing, it is internally generated.
`m`	a number greater than 1 to be used as the fuzziness exponent or fuzzifier. The default is 2.
`dmetric`	a string for the distance metric. The default is sqeuclidean for the squared Euclidean distances. See `get.dmetrics` for the alternative options.
`pw`	a number for the power of Minkowski distance calculation. The default is 2 if the `dmetric` is minkowski.
`alginitv`	a string for the initialization of cluster prototypes matrix. The default is kmpp for K-means++ initialization method (Arthur & Vassilvitskii, 2007). For the list of alternative options see `get.algorithms`.
`alginitu`	a string for the initialization of memberships degrees matrix. The default is imembrand for random sampling of initial membership degrees.
`nstart`	an integer for the number of starts for clustering. The default is 1.
`iter.max`	an integer for the maximum number of iterations allowed. The default is 1000.
`con.val`	a number for the convergence value between the iterations. The default is 1e-09.
`fixcent`	a logical flag to make the initial cluster centers not changed along the different starts of the algorithm. The default is `FALSE`. If it is `TRUE`, the initial centers are not changed in the successive starts of the algorithm when the `nstart` is greater than 1.
`fixmemb`	a logical flag to make the initial membership degrees not changed along the different starts of the algorithm. The default is `FALSE`. If it is `TRUE`, the initial memberships are not changed in the successive starts of the algorithm when the `nstart` is greater than 1.
`stand`	a logical flag to standardize data. Its default value is `FALSE`. If its value is `TRUE`, the data matrix `x` is standardized.
`numseed`	an optional seeding number to set the seed of R's random number generator.

Details

Fuzzy C-Means (FCM) clustering algorithm was firstly studied by Dunn (1973) and generalized by Bezdek in 1974 (Bezdek, 1981). Unlike K-means algorithm, each data object is not the member of only one cluster but is the member of all clusters with varying degrees of memberhip between 0 and 1. It is an iterative clustering algorithm that partitions the data set into a predefined k partitions by minimizing the weighted within group sum of squared errors. The objective function of FCM is:

J_{FCM}(\mathbf{X}; \mathbf{V}, \mathbf{U})=\sum\limits_{i=1}^n u_{ij}^m d^2(\vec{x}_i, \vec{v}_j)

In the objective function, m is the fuzzifier to specify the amount of 'fuzziness' of the clustering result; 1 \leq m \leq \infty. It is usually chosen as 2. The higher values of m result with the more fuzzy clusters while the lower values give harder clusters. If it is 1, FCM becomes an hard algorithm and produces the same results with K-means.

FCM must satisfy the following constraints:

u_{ij}=[0,1] \;\;;\; 1 \leq i\leq n \;, 1 \leq j\leq k

0 \leq \sum\limits_{i=1}^n u_{ij} \leq n \;\;;\; 1 \leq j\leq k

\sum\limits_{j=1}^k u_{ij} = 1 \;\;;\; 1 \leq i\leq n

The objective function of FCM is minimized by using the following update equations:

u_{ij} =\Bigg[\sum\limits_{j=1}^k \Big(\frac{d^2(\vec{x}_i, \vec{v}_j)}{d^2(\vec{x}_i, \vec{v}_l)}\Big)^{1/(m-1)} \Bigg]^{-1} \;\;; {1\leq i\leq n},\; {1\leq l \leq k}

\vec{v}_{j} =\frac{\sum\limits_{i=1}^n u_{ij}^m \vec{x}_i}{\sum\limits_{i=1}^n u_{ij}^m} \;\;; {1\leq j\leq k}

Value

an object of class ‘ppclust’, which is a list consists of the following items:

`x`	a numeric matrix containing the processed data set.
`v`	a numeric matrix containing the final cluster prototypes (centers of clusters).
`u`	a numeric matrix containing the fuzzy memberships degrees of the data objects.
`d`	a numeric matrix containing the distances of objects to the final cluster prototypes.
`k`	an integer for the number of clusters.
`m`	a number for the fuzzifier.
`cluster`	a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects.
`csize`	a numeric vector containing the number of objects in the clusters.
`iter`	an integer vector for the number of iterations in each start of the algorithm.
`best.start`	an integer for the index of start that produced the minimum objective functional.
`func.val`	a numeric vector for the objective function values in each start of the algorithm.
`comp.time`	a numeric vector for the execution time in each start of the algorithm.
`stand`	a logical value, `TRUE` shows that data set `x` contains the standardized values of raw data.
`wss`	a number for the within-cluster sum of squares for each cluster.
`bwss`	a number for the between-cluster sum of squares.
`tss`	a number for the total within-cluster sum of squares.
`twss`	a number for the total sum of squares.
`algorithm`	a string for the name of partitioning algorithm. It is ‘FCM’ with this function.
`call`	a string for the matched function call generating this ‘ppclust’ object.

Author(s)

Zeynel Cebeci, Figen Yildiz & Alper Tuna Kavlak

References

Arthur, D. & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding, in Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027-1035. <http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf>

Dunn, J.C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybernetics, 3(3):32-57. <doi:10.1080/01969727308546046>

Bezdek, J.C. (1974). Cluster validity with fuzzy sets. J. Cybernetics, 3: 58-73. <doi:10.1080/01969727308546047>

Bezdek J.C. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum, NY. <ISBN:0306406713>

Examples

# Load dataset iris 
data(iris)
x <- iris[,-5]

# Initialize the prototype matrix using K-means++ algorithm
v <- inaparc::kmpp(x, k=3)$v

# Initialize the memberships degrees matrix 
u <- inaparc::imembrand(nrow(x), k=3)$u

# Run FCM with the initial prototypes and memberships
fcm.res <- fcm(x, centers=v, memberships=u, m=2)

# Show the fuzzy membership degrees for the top 5 objects
head(fcm.res$u, 5)

[Package ppclust version 1.1.0.1 Index]