partitionComparison-package {partitionComparison} | R Documentation |
partitionComparison: Implements Measures for the Comparison of Two Partitions
Description
Provides several measures ((dis)similarity, distance/metric, correlation, entropy) for comparing two partitions of the same set of objects. The different measures can be assigned to three different classes: Pair comparison (containing the famous Jaccard and Rand indices), set based, and information theory based. Many of the implemented measures can be found in Albatineh AN, Niewiadomska-Bugaj M and Mihalko D (2006) doi:10.1007/s00357-006-0017-z and Meila M (2007) doi:10.1016/j.jmva.2006.11.013. Partitions are represented by vectors of class labels which allow a straightforward integration with existing clustering algorithms (e.g. kmeans()). The package is mostly based on the S4 object system.
Details
This package provides a large collection of measures to compare two partitions. Some survey articles for these measures are cited below, the seminal papers for each individual measure is provided with the function definition.
Most functionality is implemented as S4 classes and methods so that an
adoption is easily possible for special needs and specifications.
The main class is Partition
which merely wraps an atomic
vector of length n
for storing the class label of each object.
The computation of all measures is designed to work on vectors
of class labels.
All partition comparison methods can be called in the
same way: <measure method>(p, q)
with p, q
being the two
partitions (as Partition
instances).
One often does not explicitly want to transform the vector of class labels
(as output of another package's function/algorithm) into
Partition
instances before using measures from this
package. For convenience, the function
registerPartitionVectorSignatures
exists which dynamically creates
versions of all measures that will directly work with plain R vectors.
Author(s)
Maintainer: Fabian Ball mail@fabian-ball.de [copyright holder, contributor]
Other contributors:
Andreas Geyer-Schulz andreas.geyer-schulz@kit.edu [copyright holder]
References
Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006). “On Similarity Indices and Correction for Chance Agreement.” Journal of Classification, 23(2), 301–313. ISSN 0176-4268, doi:10.1007/s00357-006-0017-z.
Meila M (2007). “Comparing Clusterings–an Information Based Distance.” Journal of Multivariate Analysis, 98(5), 873–895. doi:10.1016/j.jmva.2006.11.013.
See Also
Useful links:
Report bugs at https://github.com/KIT-IISM-EM/partitionComparison/issues
Examples
# Generate some data
set.seed(42)
data <- cbind(x=c(rnorm(50), rnorm(30, mean=5)), y=c(rnorm(50), rnorm(30, mean=5)))
# Run k-means with two/three centers
data.km2 <- kmeans(data, 2)
data.km3 <- kmeans(data, 3)
# Load this library
library(partitionComparison)
# Register the measures to take ANY input
registerPartitionVectorSignatures(environment())
# Compare the clusters
randIndex(data.km2$cluster, data.km3$cluster)
# [1] 0.8101266