R: Estimate number of clusters

k.select_ref {bootcluster}

R Documentation

Estimate number of clusters

Description

Estimate number of clusters by bootstrapping stability

Usage

k.select_ref(df, k_range = 2:7, n_ref = 5, B = 100, B_ref = 50, r = 5)

Arguments

`df`	`data.frame` of the input dataset
`k_range`	`integer` valued `vector` of the numbers of clusters k to be tested upon
`n_ref`	number of reference distribution to be generated
`B`	number of bootstrap re-samples
`B_ref`	number of bootstrap resamples for the reference distributions
`r`	number of runs of k-means

Details

This function uses the out-of-bag scheme to estimate the number of clusters in a dataset. The function calculate the Smin of the dataset and at the same time, generate a reference dataset with the same range as the original dataset in each dimension and calculate the Smin_ref. The differences between Smin and Smin_ref at each k,Smin_diff(k), is taken into consideration as well as the standard deviation of the differences. We choose the k to be the argmax of ( Smin_diff(k) - ( Smin_diff(k+1) + (Smin_diff(k+1)) ) ). If Smin_diff(k) less than 0.1 for all k in k_range, we say k = 1

Value

profile: vector of ( Smin_diff(k) - ( Smin_diff(k+1) + se(Smin_diff(k+1)) ) ) measures for researchers's inspection
k: estimated number of clusters

Author(s)

Tianmou Liu

References

Bootstrapping estimates of stability for clusters, observations and model selection. Han Yu, Brian Chapman, Arianna DiFlorio, Ellen Eischen, David Gotz, Matthews Jacob and Rachael Hageman Blair.

Examples


set.seed(1)
data(iris)
df <- data.frame(iris[,1:4])
df <- scale(df)
k.select_ref(df, k_range = 2:7, n_ref = 5, B=500, B_ref = 500, r=5)

[Package bootcluster version 0.3.2 Index]