dispersion_objective {anticlust}R Documentation

Cluster dispersion


Compute the dispersion objective for a given clustering (i.e., the minimum distance between two elements within the same cluster).


dispersion_objective(x, clusters)



The data input. Can be one of two structures: (1) A feature matrix where rows correspond to elements and columns correspond to variables (a single numeric variable can be passed as a vector). (2) An N x N matrix dissimilarity matrix; can be an object of class dist (e.g., returned by dist or as.dist) or a matrix where the entries of the upper and lower triangular matrix represent pairwise dissimilarities.


A vector representing (anti)clusters (e.g., returned by anticlustering).


The dispersion is the minimum distance between two elements within the same cluster. When the input x is a feature matrix, the Euclidean distance is used as the distance unit. Maximizing the dispersion maximizes the minimum heterogeneity within clusters and is an anticlustering task.


Brusco, M. J., Cradit, J. D., & Steinley, D. (2020). Combining diversity and dispersion criteria for anticlustering: A bicriterion approach. British Journal of Mathematical and Statistical Psychology, 73, 275-396. https://doi.org/10.1111/bmsp.12186


N <- 50 # number of elements
M <- 2  # number of variables per element
K <- 5  # number of clusters
random_data <- matrix(rnorm(N * M), ncol = M)
random_clusters <- sample(rep_len(1:K, N))
dispersion_objective(random_data, random_clusters)

# Maximize the dispersion 
optimized_clusters <- anticlustering(
  K = random_clusters, 
  objective = dispersion_objective
dispersion_objective(random_data, optimized_clusters)

[Package anticlust version 0.8.5 Index]