bicriterion_anticlustering {anticlust} R Documentation

## Bicriterion iterated local search heuristic

### Description

This function implements the bicriterion for anticlustering by Brusco, Cradit, and Steinley (2020; <doi:10.1111/bmsp.12186>). The description of the algorithm is given in Section 3 of their paper (in particular, see the pseudocode in their Figure 2).

### Usage

bicriterion_anticlustering(
x,
K,
R = NULL,
W = c(1e-06, 1e-05, 1e-04, 0.001, 0.01, 0.1, 0.5, 0.99, 0.999, 0.999999),
Xi = c(0.05, 0.1)
)


### Arguments

 x The data input. Can be one of two structures: (1) A feature matrix where rows correspond to elements and columns correspond to variables (a single numeric variable can be passed as a vector). (2) An N x N matrix dissimilarity matrix; can be an object of class dist (e.g., returned by dist or as.dist) or a matrix where the entries of the upper and lower triangular matrix represent pairwise dissimilarities. K How many anticlusters should be created. Alternatively: (a) A vector describing the size of each group, or (b) a vector of length nrow(x) describing how elements are assigned to anticlusters before the optimization starts. R The desired number of restarts for the algorithm. By default, both phases (MBPI + BILS) of the algorithm are performed once. W Optional argument, a vector of weights defining the relative importance of dispersion and diversity (0 <= W <= 1). See details. Xi Optional argument, specifies probability of swapping elements during the iterated local search. See examples.

### Details

The bicriterion algorithm by Brusco, Cradit, and Steinley (2020) aims to simultaneously optimize two anticlustering criteria: the diversity_objective and the dispersion_objective. It returns a list of partitions that approximate the pareto set of efficient solutions across both criteria. By considering both the diversity and dispersion, this algorithm is well-suited for maximizing overall within-group heterogeneity. To select a partition among the approximated pareto set, it is reasonable to plot the objectives for each partition (see Examples).

The arguments R, W and Xi are named for consistency with Brusco et al. (2020). The argument K is used for consistency with other functions in anticlust; Brusco et al. used 'G' to denote the number of groups. However, note that K can not only be used to denote the number of equal-sized groups, but also to specify group sizes, as in anticlustering.

This function implements the combined bicriterion algorithm MBPI + BILS. The argument R denotes the number of restarts of the search heuristic. Half of the repetitions perform MBPI and the other half perform BILS, as suggested by Brusco et al. The argument W denotes the possible weights given to the diversity criterion in a given run of the search heuristic. In each run, the a weight is randomly selected from the vector W. As default values, we use the weights that Brusco et al. used in their analyses. All values in w have to be in [0, 1]; larger values indicate that diversity is more important, whereas smaller values indicate that dispersion is more important; w = .5 implies the same weight for both criteria. The argument Xi is the probability that an element is swapped during the iterated local search (specifically, Xi has to be a vector of length 2, denoting the range of a uniform distribution from which the probability of swapping is selected). For Xi, the default is selected consistent with the analyses by Brusco et al.

If the data input x is a feature matrix (that is: each row is a "case" and each column is a "variable"), a matrix of the Euclidean distances is computed as input to the algorithm. If a different measure of dissimilarity is preferred, you may pass a self-generated dissimilarity matrix via the argument x.

### Value

A matrix of anticlustering partitions (i.e., the approximated pareto set). Each row corresponds to a partition, each column corresponds to an input element.

### Note

For technical reasons, the pareto set returned by this function has a limit of 500 partitions. Usually however, the algorithm usually finds much fewer partitions. There is one following exception: We do not recommend to use this method when the input data is one-dimensional where the algorithm may identify too many equivalent partitions causing it to run very slowly (see section 5.6 in Breuer, 2020).

### Author(s)

Martin Breuer M.Breuer@hhu.de, Martin Papenberg martin.papenberg@hhu.de

### References

Brusco, M. J., Cradit, J. D., & Steinley, D. (2020). Combining diversity and dispersion criteria for anticlustering: A bicriterion approach. British Journal of Mathematical and Statistical Psychology, 73, 275-396. https://doi.org/10.1111/bmsp.12186

Breuer (2020). Using anticlustering to maximize diversity and dispersion: Comparing exact and heuristic approaches. Bachelor thesis.

### Examples

# Generate some random data
M <- 3
N <- 80
K <- 4
data <- matrix(rnorm(N * M), ncol = M)

# Perform bicriterion algorithm, use 200 repetitions
pareto_set <- bicriterion_anticlustering(data, K = K, R = 200)

# Compute objectives for all solutions
diversities_pareto <- apply(pareto_set, 1, diversity_objective, x = data)
dispersions_pareto <- apply(pareto_set, 1, dispersion_objective, x = data)

# Plot the pareto set
plot(
diversities_pareto,
dispersions_pareto,
col = "blue",
cex = 2,
pch = as.character(1:NROW(pareto_set))
)

# Get some random solutions for comparison
rnd_solutions <- t(replicate(n = 200, sample(pareto_set[1, ])))

# Compute objectives for all random solutions
diversities_rnd <- apply(rnd_solutions, 1, diversity_objective, x = data)
dispersions_rnd <- apply(rnd_solutions, 1, dispersion_objective, x = data)

# Plot random solutions and pareto set. Random solutions are far away
# from the good solutions in pareto set
plot(
diversities_rnd, dispersions_rnd,
col = "red",
xlab = "Diversity",
ylab = "Dispersion",
ylim = c(
min(dispersions_rnd, dispersions_pareto),
max(dispersions_rnd, dispersions_pareto)
),
xlim = c(
min(diversities_rnd, diversities_pareto),
max(diversities_rnd, diversities_pareto)
)
)

# Add approximated pareto set from bicriterion algorithm:
points(diversities_pareto, dispersions_pareto, col = "blue", cex = 2, pch = 19)



[Package anticlust version 0.8.1 Index]