mcclust-package {mcclust} | R Documentation |
Process MCMC Sample of Clusterings.
Description
Implements methods for processing a sample of (hard) clusterings, e.g. the MCMC output of a Bayesian clustering model. Among them are methods that find a single best clustering to represent the sample, which are based on the posterior similarity matrix or a relabelling algorithm.
Details
Package: | mcclust |
Type: | Package |
Version: | 1.0 |
Date: | 2009-03-12 |
License: | GPL (>= 2) |
LazyLoad: | yes |
Most important functions:
comp.psm
for computing posterior similarity matrix (PSM). Based on the PSM maxpear
and minbinder
provide
several optimization methods to find a clustering with maximal posterior expected adjusted Rand index with the true clustering or
one that minimizes the posterior expectation of a loss function by Binder (1978). minbinder
provides the optimization algorithm of
Lau and Green.
relabel
contains the relabelling algorithm of Stephens (2000).
arandi
and vi.dist
compute distance functions for clusterings, the (adjusted) Rand index and the entropy-based variation of
information distance.
Author(s)
Arno Fritsch
Maintainer: Arno Fritsch <arno.fritsch@tu-dortmund.de>
References
Binder, D.A. (1978) Bayesian cluster analysis, Biometrika 65, 31–38.
Fritsch, A. and Ickstadt, K. (2009) An improved criterion for clustering based on the posterior similarity matrix, Bayesian Analysis, accepted.
Lau, J.W. and Green, P.J. (2007) Bayesian model based clustering procedures, Journal of Computational and Graphical Statistics 16, 526–558.
Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62, 795–809.
Examples
data(cls.draw2)
# sample of 500 clusterings from a Bayesian cluster model
tru.class <- rep(1:8,each=50)
# the true grouping of the observations
psm2 <- comp.psm(cls.draw2)
# posterior similarity matrix
# optimize criteria based on PSM
mbind2 <- minbinder(psm2)
mpear2 <- maxpear(psm2)
# Relabelling
k <- apply(cls.draw2,1, function(cl) length(table(cl)))
max.k <- as.numeric(names(table(k))[which.max(table(k))])
relab2 <- relabel(cls.draw2[k==max.k,])
# compare clusterings found by different methods with true grouping
arandi(mpear2$cl, tru.class)
arandi(mbind2$cl, tru.class)
arandi(relab2$cl, tru.class)