categorical_sampling {anticlust} | R Documentation |
This function can be used to obtain a stratified split of a data set.
categorical_sampling(categories, K)
categories |
A matrix or vector of one or more categorical variables. |
K |
The number of groups that are returned. |
This function can be used to obtain a stratified split of a data set.
Using this function is like calling anticlustering
with
argument 'categories', but without optimizing a clustering objective. The
categories are just evenly split between samples. Apart from the restriction
that categories are balanced between samples, the split is random.
A vector representing the sample each element was assigned to.
data(schaper2019) categories <- schaper2019$room groups <- categorical_sampling(categories, K = 6) table(groups, categories) # Unequal sized groups groups <- categorical_sampling(categories, K = c(24, 24, 48)) table(groups, categories) # Heavily unequal sized groups, is harder to balance the groups groups <- categorical_sampling(categories, K = c(51, 19, 26)) table(groups, categories)