fpc-package {fpc} | R Documentation |
fpc package overview
Description
Here is a list of the main functions in package fpc. Most other functions are auxiliary functions for these.
Clustering methods
- dbscan
Computes DBSCAN density based clustering as introduced in Ester et al. (1996).
- fixmahal
Mahalanobis Fixed Point Clustering, Hennig and Christlieb (2002), Hennig (2005).
- fixreg
Regression Fixed Point Clustering, Hennig (2003).
- flexmixedruns
This fits a latent class model to data with mixed type continuous/nominal variables. Actually it calls a method for
flexmix
.- mergenormals
Clustering by merging components of a Gaussian mixture, see Hennig (2010).
- regmix
ML-fit of a mixture of linear regression models, see DeSarbo and Cron (1988).
Cluster validity indexes and estimation of the number of clusters
- cluster.stats
This computes several cluster validity statistics from a clustering and a dissimilarity matrix including the Calinski-Harabasz index, the adjusted Rand index and other statistics explained in Gordon (1999) as well as several characterising measures such as average between cluster and within cluster dissimilarity and separation. See also
calinhara
,dudahart2
for specific indexes, and a new versioncqcluster.stats
that computes some more indexes and statistics used for computing them. There's alsodistrsimilarity
, which computes within-cluster dissimilarity to the Gaussian and uniform distribution.- prediction.strength
Estimates the number of clusters by computing the prediction strength of a clustering of a dataset into different numbers of components for various clustering methods, see Tibshirani and Walther (2005). In fact, this is more flexible than what is in the original paper, because it can use point classification schemes that work better with clustering methods other than k-means.
- nselectboot
Estimates the number of clusters by bootstrap stability selection, see Fang and Wang (2012). This is quite flexible regarding clustering methods and point classification schemes and also allows for dissimilarity data.
- clusterbenchstats
This runs many clustering methods (to be specifed by the user) with many numbers of clusters on a dataset and produces standardised and comparable versions of many cluster validity indexes (see Hennig 2019, Akhanli and Hennig 2020). This is done by means of producing random clusterings on the given data, see
stupidkcentroids
andstupidknn
. It allows to compare many clusterings based on many different potential desirable features of a clustering.print.valstat
allows to compute an aggregated index with user-specified weights.
Cluster visualisation and validation
- clucols
Sets of colours and symbols useful for cluster plotting.
- clusterboot
Cluster-wise stability assessment of a clustering. Clusterings are performed on resampled data to see for every cluster of the original dataset how well this is reproduced. See Hennig (2007) for details.
- cluster.varstats
Extracts variable-wise information for every cluster in order to help with cluster interpretation.
- plotcluster
Visualisation of a clustering or grouping in data by various linear projection methods that optimise the separation between clusters, or between a single cluster and the rest of the data according to Hennig (2004) including classical methods such as discriminant coordinates. This calls the function
discrproj
, which is a bit more flexible but doesn't produce a plot itself.- ridgeline.diagnosis
Plots and diagnostics for assessing modality of Gaussian mixtures, see Ray and Lindsay (2005).
- weightplots
Plots to diagnose component separation in Gaussian mixtures, see Hennig (2010).
- localshape
Local shape matrix, can be used for finding clusters in connection with function
ics
in packageICS
, see Hennig's discussion and rejoinder of Tyler et al. (2009).
Useful wrapper functions for clustering methods
- kmeansCBI
This and other "CBI"-functions (see the
kmeansCBI
-help page) are unified wrappers for various clustering methods in R that may be useful because they do in one step for what you normally may need to do a bit more in R (for example fitting a Gaussian mixture with noise component in package mclust).- kmeansruns
This calls
kmeans
for the k-means clustering method and includes estimation of the number of clusters and finding an optimal solution from several starting points.- pamk
This calls
pam
andclara
for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.
Author(s)
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
References
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
DeSarbo, W. S. and Cron, W. L. (1988) A maximum likelihood methodology for clusterwise linear regression, Journal of Classification 5, 249-282.
Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).
Fang, Y. and Wang, J. (2012) Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468-477.
Gordon, A. D. (1999) Classification, 2nd ed. Chapman and Hall.
Hennig, C. (2003) Clusters, outliers and regression: fixed point clusters, Journal of Multivariate Analysis 86, 183-212.
Hennig, C. (2004) Asymmetric linear dimension reduction for classification. Journal of Computational and Graphical Statistics, 13, 930-945 .
Hennig, C. (2005) Fuzzy and Crisp Mahalanobis Fixed Point Clusters, in Baier, D., Decker, R., and Schmidt-Thieme, L. (eds.): Data Analysis and Decision Support. Springer, Heidelberg, 47-56.
Hennig, C. (2007) Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258-271.
Hennig, C. (2010) Methods for merging Gaussian mixture components, Advances in Data Analysis and Classification, 4, 3-34.
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Hennig, C. and Christlieb, N. (2002) Validating visual clusters in large datasets: Fixed point clusters of spectral features, Computational Statistics and Data Analysis 40, 723-739.
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
Ray, S. and Lindsay, B. G. (2005) The Topography of Multivariate Normal Mixtures, Annals of Statistics, 33, 2042-2065.
Tibshirani, R. and Walther, G. (2005) Cluster Validation by Prediction Strength, Journal of Computational and Graphical Statistics, 14, 511-528.