R: pseudoF (pF or Calinski-Harabsz) index for choosing k in...

apseudoF {drclust}

R Documentation

pseudoF (pF or Calinski-Harabsz) index for choosing k in partitioning models

Description

Calculates and plots the CH index for k = 2, ..., maxK. The function provides an interval wide (2tol*pF) so that the choice of K is less conservative. Instead of just choosing the maximum pF, if it exists, picks the value such that its upper bound is larger than max pF.

Usage

apseudoF(data, maxK, tol, model, Q)

Arguments

`data`	Units x variables numeric data matrix.
`maxK`	Maximum number of clusters for the units to be tested.
`tol`	Approximation value. It is half of the length of theinterval put for each pF. 0 <= tol < 1. Its default value is 0.05.
`model`	Partitioning Models to run for each value of k. (1 = doublekm; 2 = redkm; 3 = factkm; 4 = dpcakm)
`Q`	Number of principal components w.r.t. variables selected for the maxK -1 partitions to be tested.

Value

bestK

best value of K (scalar).

Author(s)

Ionel Prunila, Maurizio Vichi

References

Calinski T., Harabasz J. (1974) "A dendrite method for cluster analysis" <doi:10.1080/03610927408827101>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

apF <- apseudoF(iris, maxK=10, tol = 0.05, model = 3, Q = 2)

[Package drclust version 0.1 Index]