apseudoF {drclust}R Documentation

pseudoF (pF or Calinski-Harabsz) index for choosing k in partitioning models

Description

Calculates and plots the CH index for k = 2, ..., maxK. The function provides an interval wide (2tol*pF) so that the choice of K is less conservative. Instead of just choosing the maximum pF, if it exists, picks the value such that its upper bound is larger than max pF.

Usage

apseudoF(data, maxK, tol, model, Q)

Arguments

data

Units x variables numeric data matrix.

maxK

Maximum number of clusters for the units to be tested.

tol

Approximation value. It is half of the length of theinterval put for each pF. 0 <= tol < 1. Its default value is 0.05.

model

Partitioning Models to run for each value of k. (1 = doublekm; 2 = redkm; 3 = factkm; 4 = dpcakm)

Q

Number of principal components w.r.t. variables selected for the maxK -1 partitions to be tested.

Value

bestK

best value of K (scalar).

Author(s)

Ionel Prunila, Maurizio Vichi

References

Calinski T., Harabasz J. (1974) "A dendrite method for cluster analysis" <doi:10.1080/03610927408827101>

Examples

# Iris data 
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5]) 

apF <- apseudoF(iris, maxK=10, tol = 0.05, model = 3, Q = 2)


[Package drclust version 0.1 Index]