optimal_anticlustering {anticlust} | R Documentation |

## Optimal ("exact") algorithms for anticlustering

### Description

Wrapper function that gives access to all optimal algorithms for anticlustering that are available in anticlust.

### Usage

```
optimal_anticlustering(x, K, objective, solver = NULL)
```

### Arguments

`x` |
The data input. Can be one of two structures: (1) A
feature matrix where rows correspond to elements and columns
correspond to variables (a single numeric variable can be
passed as a vector). (2) An N x N matrix dissimilarity matrix;
can be an object of class |

`K` |
How many anticlusters should be created or alternatively:
(a) A vector describing the size of each group (the latter currently
only works for |

`objective` |
The anticlustering objective, can be "diversity", "variance", "kplus" or "dispersion". |

`solver` |
Optional. The solver used to obtain the optimal method. Currently supports "glpk" and "symphony". See details. |

### Details

This is a wrapper for all optimal methods supported in anticlust (currently and in the future).
As compared to `anticlustering`

, it allows to specify the solver to obtain an optimal
solution and it can be used to obtain optimal solutions for all supported
anticlustering objectives (variance, diversity, k-plus, dispersion). For
the objectives "variance", "diversity" and "kplus", the optimal ILP method
in Papenberg and Klau (2021) is used, which maximizes the sum of all pairwise
intra-cluster distances (given user specified number of clusters, for equal-sized clusters).
To employ k-means anticlustering (i.e. set `objective = "variance"`

), the
squared Euclidean distance is used. For k-plus anticlustering, the squared Euclidean distance
based on the extended k-plus data matrix is used (see `kplus_moment_variables`

).
For the diversity (and the dispersion), the Euclidean distance is used by default,
but any user-defined dissimilarity matrix is possible.

The dispersion is solved optimal using the approach described in `optimal_dispersion`

.

The optimal methods either require the R package `Rglpk`

and the GNU linear programming kit
(<http://www.gnu.org/software/glpk/>), or the R package
`Rsymphony`

and the COIN-OR SYMPHONY solver libraries
(<https://github.com/coin-or/SYMPHONY>). If the argument `solver`

is not
specified by the user, the function will try to find the GLPK or SYMPHONY
solver and throw an error if none is available. It will select the
GLPK solver if both are available because some rare instances have been observed where
the SYMPHONY solver crashes on Mac computers. I would still try out the
SYMPHONY solver to see if the unlikely crash occurs. However, this has to be
set by the user (at least if both solver packages Rsymphony and Rglpk are available on the system).

### Value

A vector of length N that assigns a group (i.e, a number
between 1 and `K`

) to each input element.

### Author(s)

Martin Papenberg martin.papenberg@hhu.de

### Examples

```
# data <- matrix(rnorm(24), ncol = 2)
# These calls are equivalent for k-means anticlustering:
# optimal_anticlustering(data, K = 2, objective = "variance")
# optimal_anticlustering(dist(data)^2, K = 2, objective = "diversity")
# These calls are equivalent for k-plus anticlustering:
# optimal_anticlustering(data, K = 2, objective = "kplus")
# optimal_anticlustering(dist(kplus_moment_variables(data, 2))^2, K = 2, objective = "diversity")
```

*anticlust*version 0.8.5 Index]