categorical_sampling {anticlust} | R Documentation |

## Random sampling employing a categorical constraint

### Description

This function can be used to obtain a stratified split of a data set.

### Usage

```
categorical_sampling(categories, K)
```

### Arguments

`categories` |
A matrix or vector of one or more categorical variables. |

`K` |
The number of groups that are returned. |

### Details

This function can be used to obtain a stratified split of a data set.
Using this function is like calling `anticlustering`

with
argument 'categories', but without optimizing a clustering objective. The
categories are just evenly split between samples. Apart from the restriction
that categories are balanced between samples, the split is random.

### Value

A vector representing the sample each element was assigned to.

### Examples

```
data(schaper2019)
categories <- schaper2019$room
groups <- categorical_sampling(categories, K = 6)
table(groups, categories)
# Unequal sized groups
groups <- categorical_sampling(categories, K = c(24, 24, 48))
table(groups, categories)
# Heavily unequal sized groups, is harder to balance the groups
groups <- categorical_sampling(categories, K = c(51, 19, 26))
table(groups, categories)
```

[Package

*anticlust*version 0.8.5 Index]