apseudoF {drclust} | R Documentation |
pseudoF (pF or Calinski-Harabsz) index for choosing k in partitioning models
Description
Calculates and plots the CH index for k = 2, ..., maxK. The function provides an interval wide (2tol*pF) so that the choice of K is less conservative. Instead of just choosing the maximum pF, if it exists, picks the value such that its upper bound is larger than max pF.
Usage
apseudoF(data, maxK, tol, model, Q)
Arguments
data |
Units x variables numeric data matrix. |
maxK |
Maximum number of clusters for the units to be tested. |
tol |
Approximation value. It is half of the length of theinterval put for each pF. 0 <= tol < 1. Its default value is 0.05. |
model |
Partitioning Models to run for each value of k. (1 = doublekm; 2 = redkm; 3 = factkm; 4 = dpcakm) |
Q |
Number of principal components w.r.t. variables selected for the maxK -1 partitions to be tested. |
Value
bestK |
best value of K (scalar). |
Author(s)
Ionel Prunila, Maurizio Vichi
References
Calinski T., Harabasz J. (1974) "A dendrite method for cluster analysis" <doi:10.1080/03610927408827101>
Examples
# Iris data
# Loading the numeric variables of iris data
iris <- as.matrix(iris[,-5])
apF <- apseudoF(iris, maxK=10, tol = 0.05, model = 3, Q = 2)