R: partitionComparison: Implements Measures for the Comparison...

partitionComparison-package {partitionComparison}

R Documentation

partitionComparison: Implements Measures for the Comparison of Two Partitions

Description

Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) doi:10.1007/s00357-006-0017-z and Meila M (2007) doi:10.1016/j.jmva.2006.11.013. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.

Details

This package provides a large collection of measures to compare two partitions. Some survey articles for these measures are cited below, the seminal papers for each individual measure is provided with the function definition.

Most functionality is implemented as S4 classes and methods so that an adoption is easily possible for special needs and specifications. The main class is Partition which merely wraps an atomic vector of length n for storing the class label of each object. The computation of all measures is designed to work on vectors of class labels.

All partition comparison methods can be called in the same way: <measure method>(p, q) with p, q being the two partitions (as Partition instances). One often does not explicitly want to transform the vector of class labels (as output of another package's function/algorithm) into Partition instances before using measures from this package. For convenience, the function registerPartitionVectorSignatures exists which dynamically creates versions of all measures that will directly work with plain R vectors.

Author(s)

Maintainer: Fabian Ball mail@fabian-ball.de [copyright holder, contributor]

Other contributors:

Andreas Geyer-Schulz andreas.geyer-schulz@kit.edu [copyright holder]

References

Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006). “On Similarity Indices and Correction for Chance Agreement.” Journal of Classification, 23(2), 301–313. ISSN 0176-4268, doi:10.1007/s00357-006-0017-z.

Meila M (2007). “Comparing Clusterings–an Information Based Distance.” Journal of Multivariate Analysis, 98(5), 873–895. doi:10.1016/j.jmva.2006.11.013.

Examples

# Generate some data
set.seed(42)
data <- cbind(x=c(rnorm(50), rnorm(30, mean=5)), y=c(rnorm(50), rnorm(30, mean=5)))
# Run k-means with two/three centers
data.km2 <- kmeans(data, 2)
data.km3 <- kmeans(data, 3)

# Load this library
library(partitionComparison)
# Register the measures to take ANY input
registerPartitionVectorSignatures(environment())
# Compare the clusters
randIndex(data.km2$cluster, data.km3$cluster)
# [1] 0.8101266