gMADD_DI {HDLSSkST} | R Documentation |
Modified K-Means Algorithm by Using a New Dissimilarity Measure, MADD and DUNN Index
Description
Performs modified K-means algorithm by using a new dissimilarity measure, called MADD and DUNN index, and provides estimated cluster (class) labels or memberships and corresponding DUNN index of the observations.
Usage
gMADD_DI(s_psi, s_h, kmax, lb, M)
Arguments
s_psi |
function required for clustering, 1 for |
s_h |
function required for clustering, 1 for |
kmax |
maximum value of total number of clusters to estimate total number of clusters in the whole observations |
lb |
each observation is partitioned into some numbers of smaller vectors of same length |
M |
|
Details
DUNN index is used for cluster validation, but here we use it to estimate total number of cluster k
by \hat k = argmax_{2\le k' \le k^*}DI(k')
. Here DI(k')
represents the DUNN index and we use k^*=2*k
.
Value
a kmax \times (n+1)
matrix of the estimated cluster (class) labels and corresponding DUNN indexes of observations
Note
The result of this gMADD_DI function is a matrix. The 1st row of this matrix doesn't provide anything about estimated class labels or DUNN index of observations since the DUNN index is only defined for k\ge 2
. The last column of this matrix represents the DUNN indexes. The estimated cluster labels of observations are calculated by finding out the corresponding row of maximum DUNN index.
Author(s)
Biplab Paul, Shyamal K. De and Anil K. Ghosh
Maintainer: Biplab Paul<paul.biplab497@gmail.com>
References
Biplab Paul, Shyamal K De and Anil K Ghosh (2021). Some clustering based exact distribution-free k-sample tests applicable to high dimension, low sample size data, Journal of Multivariate Analysis, doi:10.1016/j.jmva.2021.104897.
Soham Sarkar and Anil K Ghosh (2019). On perfect clustering of high dimension, low sample size data, IEEE transactions on pattern analysis and machine intelligence, doi:10.1109/TPAMI.2019.2912599.
Joseph C Dunn (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters, doi:10.1080/01969727308546046.
Examples
# Modified K-means algorithm:
# muiltivariate normal distribution
# generate data with dimension d = 500
set.seed(151)
n1=n2=n3=n4=10
d = 500
I1 <- matrix(rnorm(n1*d,mean=0,sd=1),n1,d)
I2 <- matrix(rnorm(n2*d,mean=0.5,sd=1),n2,d)
I3 <- matrix(rnorm(n3*d,mean=1,sd=1),n3,d)
I4 <- matrix(rnorm(n4*d,mean=1.5,sd=1),n4,d)
n_cl <- 4
N <- n1+n2+n3+n4
X <- as.matrix(rbind(I1,I2,I3,I4))
dvec_di_mat <- gMADD_DI(1,1,2*n_cl,1,X)
est_no_cl <- which.max(dvec_di_mat[ ,(N+1)])
dvec_di_mat[est_no_cl,1:N]
## outputs:
#[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3