R: Partitioner: K-means, ICC, scaled means

part_kmeans {partition}

R Documentation

Partitioner: K-means, ICC, scaled means

Description

Partitioners are functions that tell the partition algorithm 1) what to try to reduce 2) how to measure how much information is lost from the reduction and 3) how to reduce the data. In partition, functions that handle 1) are called directors, functions that handle 2) are called metrics, and functions that handle 3) are called reducers. partition has a number of pre-specified partitioners for agglomerative data reduction. Custom partitioners can be created with as_partitioner().

Pass partitioner objects to the partitioner argument of partition().

part_kmeans() uses the following direct-measure-reduce approach:

direct: direct_k_cluster(), K-Means Clusters
measure: measure_min_icc(), Minimum Intraclass Correlation
reduce: reduce_kmeans(), Scaled Row Means

Usage

part_kmeans(
  algorithm = c("armadillo", "Hartigan-Wong", "Lloyd", "Forgy", "MacQueen"),
  search = c("binary", "linear"),
  init_k = NULL,
  n_hits = 4
)

Arguments

`algorithm`	The K-Means algorithm to use. The default is a fast version of the LLoyd algorithm written in armadillo. The rest are options in `kmeans()`. In general, armadillo is fastest, but the other algorithms can be faster in high dimensions.
`search`	The search method. Binary search is generally more efficient but linear search can be faster in very low dimensions.
`init_k`	The initial k to test. If `NULL`, then the initial k is the threshold times the number of variables.
`n_hits`	In linear search method, the number of iterations that should be under the threshold before reducing; useful for preventing false positives.

Value

a partitioner

Examples

set.seed(123)
df <- simulate_block_data(c(3, 4, 5), lower_corr = .4, upper_corr = .6, n = 100)

# fit partition using part_kmeans()
partition(df, threshold = .6, partitioner = part_kmeans())