R: Clustering Evaluation Index Based on Known Ground Truth

cluster.evaluation {TSclust}

R Documentation

Clustering Evaluation Index Based on Known Ground Truth

Description

Computes the similarity between the true cluster solution and the one obtained with a method under evaluation.

Usage

cluster.evaluation(G, S)

Arguments

`G`	Integer vector with the labels of the true cluster solution. Each element of the vector specifies the cluster 'id' that the element belongs to.
`S`	Integer vector with the labels of the cluster solution to be evaluated. Each element of the vector specifies the cluster 'id' that the element belongs to.

Details

The measure of clustering evaluation is defined as

Sim(G,C) = 1/k \sum_{i=1}^k \max_{1\leq j\leq k} Sim(G_i,C_j),

where

Sim(G_i, C_j) = \frac{ 2 | G_i \cap C_j|}{ |G_i| + |C_j|}

with |.| denoting the cardinality of the elements in the set. This measure has been used for comparing different clusterings, e.g. in Kalpakis et al. (2001) and Pértega and Vilar (2010).

Value

The computed index.

Note

This index is not simmetric.

Author(s)

Pablo Montero Manso, José Antonio Vilar.

References

Larsen, B. and Aone, C. (1999) Fast and effective text mining using linear-time document clustering. Proc. KDD' 99.16–22.

Kalpakis, K., Gada D. and Puttagunta, V. (2001) Distance measures for effective clustering of arima time-series. Proceedings 2001 IEEE International Conference on Data Mining, 273–280.

Pértega S. and Vilar, J.A (2010) Comparing several parametric and nonparametric approaches to time series clustering: A simulation study. J. Classification, 27(3), 333-362.

Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

Examples


 #create a true cluster 
 #(first 4 elements belong to cluster '1', next 4 to cluster '2' and the last 4 to cluster '3'.
 true_cluster <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
 #the cluster to be tested
 new_cluster <- c( 2, 1, 2, 3, 3, 2, 2, 1, 3, 3, 3, 3)
 
 #get the index
 cluster.evaluation(true_cluster, new_cluster)
 
 #it can be seen that the index is not simmetric
 cluster.evaluation(new_cluster, true_cluster)

[Package TSclust version 1.3.1 Index]