Zsimilarity {IMIFA} | R Documentation |
Summarise MCMC samples of clustering labels with a similarity matrix and find the 'average' clustering
Description
This function takes a Monte Carlo sample of cluster labels, computes an average similarity matrix and returns the clustering with minimum mean squared error to this average. The mcclust
package must be loaded.
Usage
Zsimilarity(zs)
Arguments
zs |
A matrix containing samples of clustering labels where the columns correspond to the number of observations (N) and the rows correspond to the number of iterations (M). |
Details
This function takes a Monte Carlo sample of cluster labels, converts them to adjacency matrices, and computes a similarity matrix as an average of the adjacency matrices. The dimension of the similarity matrix is invariant to label switching and the number of clusters in each sample, desirable features when summarising partitions of Bayesian nonparametric models such as IMIFA. As a summary of the posterior clustering, the clustering with minimum mean squared error to this 'average' clustering is reported.
A heatmap of z.sim
may provide a useful visualisation, if appropriately ordered. The user is also invited to perform hierarchical clustering using hclust
after first converting this similarity matrix to a distance matrix - "complete" linkage is recommended. Alternatively, hc
could be used.
Value
A list containing three elements:
z.avg |
The 'average' clustering, with minimum squared distance to |
z.sim |
The N x N similarity matrix, in a sparse format (see |
MSE.z |
A vector of length M recording the MSEs between each clustering and the 'average' clustering. |
Note
The mcclust
package must be loaded.
This is liable to take quite some time to run, especially if the number of observations &/or number of iterations is large. Depending on how distinct the clusters are, z.sim
may be stored better in a non-sparse format. This function can optionally be called inside get_IMIFA_results
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Carmona, C., Nieto-barajas, L. and Canale, A. (2018) Model based approach for household clustering with mixed scale variables. Advances in Data Analysis and Classification, 13(2): 559-583.
See Also
get_IMIFA_results
, simple_triplet_matrix
, hclust
, hc
, comp.psm
, cltoSim
Examples
# Run a IMIFA model and extract the sampled cluster labels
# data(olive)
# sim <- mcmc_IMIFA(olive, method="IMIFA", n.iters=5000)
# zs <- sim[[1]][[1]]$z.store
# Get the similarity matrix and visualise it
# zsimil <- Zsimilarity(zs)
# z.sim <- as.matrix(zsimil$z.sim)
# z.col <- mat2cols(z.sim, cols=heat.colors(30, rev=TRUE))
# z.col[z.sim == 0] <- NA
# plot_cols(z.col, na.col=par()$bg); box(lwd=2)
# Extract the clustering with minimum squared distance to this
# 'average' and evaluate its performance against the true labels
# table(zsimil$z.avg, olive$area)
# Perform hierarchical clustering on the distance matrix
# Hcl <- hclust(as.dist(1 - z.sim), method="complete")
# plot(Hcl)
# table(cutree(Hcl, k=3), olive$area)