R: Estimate number of clusters

k.select {bootcluster}

R Documentation

Estimate number of clusters

Description

Estimate number of clusters by bootstrapping stability

Usage

k.select(x, range = 2:7, B = 20, r = 5, threshold = 0.8, scheme_2 = TRUE)

Arguments

`x`	a `data.frame` of the data set
`range`	a `vector` of `integer` values, of the possible numbers of clusters k
`B`	number of bootstrap re-samplings
`r`	number of runs of k-means
`threshold`	the threshold for determining k
`scheme_2`	`logical` `TRUE` if scheme 2 is used, `FASLE` if scheme 1 is used

Details

This function estimates the number of clusters through a bootstrapping approach, and a measure Smin, which is based on an observation-wise similarity among clusterings. The number of clusters k is selected as the largest number of clusters, for which the Smin is greater than a threshold. The threshold is often selected between 0.8 ~ 0.9. Two schemes are provided. Scheme 1 uses the clustering of the original data as the reference for stability calculations. Scheme 2 searches acrossthe clustering samples that gives the most stable clustering.

Value

profile: a vector of Smin measures for determining k
k: integer estimated number of clusters

Author(s)

Han Yu

References

Bootstrapping estimates of stability for clusters, observations and model selection. Han Yu, Brian Chapman, Arianna DiFlorio, Ellen Eischen, David Gotz, Matthews Jacob and Rachael Hageman Blair.

Examples


set.seed(1)
data(wine)
x0 <- wine[,2:14]
x <- scale(x0)
k.select(x, range = 2:10, B=20, r=5, scheme_2 = TRUE)

[Package bootcluster version 0.3.2 Index]