kpeaks-package {kpeaks} | R Documentation |
Determination of K Using Peak Counts of Features for Clustering
Description
The input argument k, represents the number of clusters is needed to start all the partitioning clustering algorithms. In unsupervised learning applications, an optimal value of this argument is widely determined by using the internal validity indexes. Since these indexes suggest a k value which is computed on the clustering results obtained with several runs of a clustering algorithm, they are computationally expensive. On the contrary, the package 'kpeaks' enables to estimate k before running any clustering algorithm. It is based on a simple novel technique using the descriptive statistics of peak counts of the features in a dataset.
Details
The package 'kpeaks' contains five functions and one synthetically created dataset for testing purposes. In order to suggest an estimate of k, the function findk
internally calls the functions genpolygon
and findpolypeaks
, respectively. The frequency polygons can be visually inspected by using the function plotpolygon
. Using the function rmshoulders
is recommended to flatten or remove the the shoulder peaks around the main peaks of a frequency polygon, if any.
Author(s)
Zeynel Cebeci, Cagatay Cebeci
References
Cebeci, Z. & Cebeci, C. (2018). "A novel technique for fast determination of K in partitioning cluster analysis", Journal of Agricultural Informatics, 9(2), 1-11. doi: 10.17700/jai.2018.9.2.442.
Cebeci, Z. & Cebeci, C. (2018). "kpeaks: An R Package for Quick Selection of K for Cluster Analysis", In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), IEEE. doi: 10.1109/IDAP.2018.8620896.
See Also
findk
,
findpolypeaks
,
genpolygon
,
plotpolygon
,
rmshoulders