categorical_sampling {anticlust} | R Documentation |
This function can be used to obtain a stratified split of a data set.
categorical_sampling(categories, K)
categories |
A matrix or vector of one or more categorical variables. |
K |
The number of groups that are returned. |
This function can be used to obtain a stratified split of a data set.
Using this function is like calling anticlustering
with
argument 'categories', but without optimizing a clustering objective. The
categories are just evenly split between samples. Apart from the restriction
that categories are balanced between samples, the split is random.
A vector representing the sample each element was assigned to.
data(schaper2019)
categories <- schaper2019$room
groups <- categorical_sampling(categories, K = 6)
table(groups, categories)
# Unequal sized groups
groups <- categorical_sampling(categories, K = c(24, 24, 48))
table(groups, categories)
# Heavily unequal sized groups, is harder to balance the groups
groups <- categorical_sampling(categories, K = c(51, 19, 26))
table(groups, categories)