k.select_ref {bootcluster}R Documentation

Estimate number of clusters

Description

Estimate number of clusters by bootstrapping stability

Usage

k.select_ref(df, k_range = 2:7, n_ref = 5, B = 100, B_ref = 50, r = 5)

Arguments

df

data.frame of the input dataset

k_range

integer valued vector of the numbers of clusters k to be tested upon

n_ref

number of reference distribution to be generated

B

number of bootstrap re-samples

B_ref

number of bootstrap resamples for the reference distributions

r

number of runs of k-means

Details

This function uses the out-of-bag scheme to estimate the number of clusters in a dataset. The function calculate the Smin of the dataset and at the same time, generate a reference dataset with the same range as the original dataset in each dimension and calculate the Smin_ref. The differences between Smin and Smin_ref at each k,Smin_diff(k), is taken into consideration as well as the standard deviation of the differences. We choose the k to be the argmax of ( Smin_diff(k) - ( Smin_diff(k+1) + (Smin_diff(k+1)) ) ). If Smin_diff(k) less than 0.1 for all k in k_range, we say k = 1

Value

profile

vector of ( Smin_diff(k) - ( Smin_diff(k+1) + se(Smin_diff(k+1)) ) ) measures for researchers's inspection

k

estimated number of clusters

Author(s)

Tianmou Liu

References

Bootstrapping estimates of stability for clusters, observations and model selection. Han Yu, Brian Chapman, Arianna DiFlorio, Ellen Eischen, David Gotz, Matthews Jacob and Rachael Hageman Blair.

Examples


set.seed(1)
data(iris)
df <- data.frame(iris[,1:4])
df <- scale(df)
k.select_ref(df, k_range = 2:7, n_ref = 5, B=500, B_ref = 500, r=5)


[Package bootcluster version 0.3.2 Index]