Modified K-Means Algorithm by Using a New Dissimilarity Measure, MADD and DUNN Index


Performs modified K-means algorithm by using a new dissimilarity measure, called MADD and DUNN index, and provides estimated cluster (class) labels or memberships and corresponding DUNN index of the observations.


gMADD_DI(s_psi, s_h, kmax, lb, M)



function required for clustering, 1 for t2t^2, 2 for 1exp(t)1-\exp(-t), 3 for 1exp(t2)1-\exp(-t^2), 4 for log(1+t)\log(1+t), 5 for tt


function required for clustering, 1 for t\sqrt t, 2 for tt


maximum value of total number of clusters to estimate total number of clusters in the whole observations


each observation is partitioned into some numbers of smaller vectors of same length lblb


n×dn\times d observations matrix of pooled sample, the observations should be grouped by their respective classes


DUNN index is used for cluster validation, but here we use it to estimate total number of cluster kk by k^=argmax2kkDI(k)\hat k = argmax_{2\le k' \le k^*}DI(k'). Here DI(k)DI(k') represents the DUNN index and we use k=2kk^*=2*k.


a kmax×(n+1)kmax \times (n+1) matrix of the estimated cluster (class) labels and corresponding DUNN indexes of observations


The result of this gMADD_DI function is a matrix. The 1st row of this matrix doesn't provide anything about estimated class labels or DUNN index of observations since the DUNN index is only defined for k2k\ge 2. The last column of this matrix represents the DUNN indexes. The estimated cluster labels of observations are calculated by finding out the corresponding row of maximum DUNN index.


Biplab Paul, Shyamal K. De and Anil K. Ghosh

Maintainer: Biplab Paul<>


  # Modified K-means algorithm:
  # muiltivariate normal distribution
  # generate data with dimension d = 500
  d = 500
  I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
  I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d) 
  I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d) 
  I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d) 
  n_cl <- 4
  N <- n1+n2+n3+n4
  X <- as.matrix(rbind(I1,I2,I3,I4)) 
  dvec_di_mat <-  gMADD_DI(1,1,2*n_cl,1,X)
  est_no_cl <- which.max(dvec_di_mat[ ,(N+1)])

   ## outputs:
