| clustatsum {fpc} | R Documentation |
Compute and format cluster validation statistics
Description
clustatsum computes cluster validation statistics by running
cqcluster.stats,
and potentially distrsimilarity, and collecting some key
statistics values with a somewhat different nomenclature.
This was implemented as a helper function for use inside of
clusterbenchstats and cgrestandard.
Usage
clustatsum(datadist=NULL,clustering,noisecluster=FALSE,
datanp=NULL,npstats=FALSE,useboot=FALSE,
bootclassif=NULL,
bootmethod="nselectboot",
bootruns=25, cbmethod=NULL,methodpars=NULL,
distmethod=NULL,dnnk=2,
pamcrit=TRUE,...)
Arguments
datadist |
distances on which validation-measures are based, |
clustering |
an integer vector of length of the number of cases, which indicates a clustering. The clusters have to be numbered from 1 to the number of clusters. |
noisecluster |
logical. If |
datanp |
optional observations times variables data matrix, see
|
npstats |
logical. If |
useboot |
logical. If |
bootclassif |
If |
bootmethod |
either |
bootruns |
integer. Number of resampling runs. If
|
cbmethod |
CBI-function (see |
methodpars |
parameters to be passed on to |
distmethod |
logical. In case of |
dnnk |
|
pamcrit |
|
... |
further arguments to be passed on to
|
Value
clustatsum returns a list. The components, as listed below, are
outputs of summary.cquality with default parameters,
which means that they are partly transformed versions of those given
out by cqcluster.stats, i.e., their range is between 0
and 1 and large values are good. Those from
distrsimilarity are computed with
largeisgood=TRUE, correspondingly.
avewithin |
average distance within clusters (reweighted so that every observation, rather than every distance, has the same weight). |
mnnd |
average distance to |
cvnnd |
coefficient of variation of dissimilarities to
|
maxdiameter |
maximum cluster diameter. |
widestgap |
widest within-cluster gap or average of cluster-wise
widest within-cluster gap, depending on parameter |
sindex |
separation index, see argument |
minsep |
minimum cluster separation. |
asw |
average silhouette
width. See |
dindex |
this index measures to what extent the density decreases from the cluster mode to the outskirts; I-densdec in Sec. 3.6 of Hennig (2019). |
denscut |
this index measures whether cluster boundaries run through density valleys; I-densbound in Sec. 3.6 of Hennig (2019). |
highdgap |
this measures whether there is a large within-cluster gap with high density on both sides; I-highdgap in Sec. 3.6 of Hennig (2019). |
pearsongamma |
correlation between distances and a 0-1-vector where 0 means same cluster, 1 means different clusters. "Normalized gamma" in Halkidi et al. (2001). |
withinss |
a generalisation of the within clusters sum
of squares (k-means objective function), which is obtained if
|
entropy |
entropy of the distribution of cluster memberships, see Meila(2007). |
pamc |
average distance to cluster centroid. |
kdnorm |
Kolmogorov distance between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution, aggregated over clusters (I am grateful to Agustin Mayo-Iscar for the idea). |
kdunif |
Kolmogorov distance between distribution of distances to
|
boot |
if |
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
References
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
Halkidi, M., Batistakis, Y., Vazirgiannis, M. (2001) On Clustering Validation Techniques, Journal of Intelligent Information Systems, 17, 107-145.
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
Meila, M. (2007) Comparing clusterings?an information based distance, Journal of Multivariate Analysis, 98, 873-895.
See Also
cqcluster.stats, distrsimilarity
Examples
set.seed(20000)
options(digits=3)
face <- rFace(20,dMoNo=2,dNoEy=0,p=2)
dface <- dist(face)
complete3 <- cutree(hclust(dface),3)
clustatsum(dface,complete3)