R: Stephens' Relabelling Algorithm for Clusterings

relabel {mcclust}

R Documentation

Stephens' Relabelling Algorithm for Clusterings

Description

For a sample of clusterings in which corresponding clusters have different labels the algorithm attempts to bring the clusterings to a unique labelling.

Usage

relabel(cls, print.loss = TRUE)

Arguments

`cls`	a matrix in which every row corresponds to a clustering of the `ncol(cls)` objects.
`print.loss`	logical, should current value of loss function be printed after each iteration? Defaults to TRUE.

Details

The algorithm minimizes the loss function

\sum_{m=1}^M\sum_{i=1}^n\sum_{j=1}^K-\log\hat{p}_{ij} \cdot I_{\{z_i^{(m)}=j\}}

over the M clusterings, n observations and K clusters, where \hat{p}_{ij} is the estimated probability that observation i belongs to cluster j and z_i^{(m)} indicates to which cluster observation i belongs in clustering m. I_{\{.\}} is an indicator function.

Minimization is achieved by iterating the estimation of \hat{p}_{ij} over all clusterings and the minimization of the loss function in each clustering by permuting the cluster labels. The latter is done by linear programming.

Value

`cls`	the input `cls` with unified labelling.
`P`	an `n \times K` matrix, where entry `[i,j]` contains the estimated probability that observation `i` belongs to cluster `j`.
`loss.val`	value of the loss function.
`cl`	vector of cluster memberships that have the highest probabilities `\hat{p}_{ij}`.

Warning

The algorithm assumes that the number of clusters K is fixed. If this is not the case K is taken to be the most common number of clusters. Clusterings with other numbers of clusters are discarded and a warning is issued.

Note

The implementation is a variant of the algorithm of Stephens which is originally applied to draws of parameters for each observation, not to cluster labels.

Author(s)

Arno Fritsch, arno.fritsch@tu-dortmund.de

References

Stephens, M. (2000) Dealing with label switching in mixture models. Journal of the Royal Statistical Society Series B, 62, 795–809.

Examples

(cls <- rbind(c(1,1,2,2),c(1,1,2,2),c(1,2,2,2),c(2,2,1,1)))
# group 2 in clustering 4 corresponds to group 1 in clustering 1-3.
cls.relab <- relabel(cls)
cls.relab$cls

[Package mcclust version 1.0.1 Index]