categorical_sampling {anticlust}R Documentation

Random sampling employing a categorical constraint

Description

This function can be used to obtain a stratified split of a data set.

Usage

categorical_sampling(categories, K)

Arguments

categories

A matrix or vector of one or more categorical variables.

K

The number of groups that are returned.

Details

This function can be used to obtain a stratified split of a data set. Using this function is like calling anticlustering with argument 'categories', but without optimizing a clustering objective. The categories are just evenly split between samples. Apart from the restriction that categories are balanced between samples, the split is random.

Value

A vector representing the sample each element was assigned to.

Examples


data(schaper2019)
categories <- schaper2019$room
groups <- categorical_sampling(categories, K = 6)
table(groups, categories)

# Unequal sized groups
groups <- categorical_sampling(categories, K = c(24, 24, 48))
table(groups, categories)

# Heavily unequal sized groups, is harder to balance the groups
groups <- categorical_sampling(categories, K = c(51, 19, 26))
table(groups, categories)


[Package anticlust version 0.8.1 Index]