variation_info {clevr} | R Documentation |
Variation of Information Between Clusterings
Description
Computes the variation of information between two clusterings, such as a predicted and ground truth clustering.
Usage
variation_info(true, pred, base = exp(1))
Arguments
true |
ground truth clustering represented as a membership vector. Each entry corresponds to an element and the value identifies the assigned cluster. The specific values of the cluster identifiers are arbitrary. |
pred |
predicted clustering represented as a membership vector. |
base |
base of the logarithm. Defaults to |
Details
Variation of information is an entropy-based distance metric
on the space of clusterings. It is unnormalized and varies between
0
and \log(N)
where N
is the number of
clustered elements. Larger values of the distance metric correspond
to greater dissimilarity between the clusterings.
References
Arabie, P. and Boorman, S. A. "Multidimensional scaling of measures of distance between partitions." Journal of Mathematical Psychology 10:2, 148-203, (1973). doi:10.1016/0022-2496(73)90012-6
Meilă, M. "Comparing Clusterings by the Variation of Information." In: Learning Theory and Kernel Machines, Lecture Notes in Computer Science 2777, Springer, Berlin, Heidelberg, (2003). doi:10.1007/978-3-540-45167-9_14
Examples
true <- c(1,1,1,2,2) # ground truth clustering
pred <- c(1,1,2,2,2) # predicted clustering
variation_info(true, pred)