DBSclustering {DatabionicSwarm} | R Documentation |
Databonic swarm clustering (DBS)
Description
DBS is a flexible and robust clustering framework that consists
of three independent modules. The first module is the parameter-free
projection method Pswarm Pswarm
, which exploits the concepts of
self-organization and emergence, game theory, swarm intelligence and symmetry
considerations [Thrun/Ultsch, 2021]. The second module is a parameter-free
high-dimensional data visualization technique, which generates projected points
on a topographic map with hypsometric colors
GeneratePswarmVisualization
, called the generalized U-matrix.
The third module is a clustering method with no sensitive parameters
DBSclustering
(see [Thrun, 2018, p. 104 ff]). The clustering can
be verified by the visualization and vice versa. The term DBS refers to the
method as a whole.
The DBSclustering
function applies the automated Clustering
approach of the Databonic swarm using abstract U distances, which are the
geodesic distances based on high-dimensional distances combined with low
dimensional graph paths by using ShortestGraphPathsC
.
Usage
DBSclustering(k, DataOrDistance, BestMatches, LC, StructureType = TRUE,
PlotIt = FALSE, ylab,main, method = "euclidean",...)
Arguments
k |
number of clusters, how many to you see in the topographic map (3D landscape)? |
DataOrDistance |
Either [1:n,1:d] Matrix of Data (n cases, d dimensions) that will be used. One DataPoint per row or symmetric Distance matrix [1:n,1:n] |
BestMatches |
[1:n,1:2] Matrix with positions of Bestmatches or ProjectedPoints, one matrix line per data point |
LC |
grid size c(Lines,Columns), please see details |
StructureType |
Optional, bool; = TRUE: compact structure of clusters assumed, =FALSE: connected structure of clusters assumed. For the two options for Clusters, see [Thrun, 2018] or Handl et al. 2006 |
PlotIt |
Optional, bool, Plots Dendrogramm |
ylab |
Optional, character vector, ylabel of dendrogramm |
main |
Optional, character vctor, title of dendrogramm |
method |
Optional, one of 39 distance methods of |
... |
Further arguments passed on to the |
Details
The input of the LC
parameter depends on the choice of Bestmatches
input argument. Usually as the name of the argument states, the Bestmatches of
the GeneratePswarmVisualization
function are used which is define
in the notation of self-organizing map. In this case please see example one.
However, as written above, clustering and visualization can be applied
independently of each other. In this case the places of Lines L and Columns C
are switched because Lines is a value slightly above the maximum of the x-coordinates and Columns is a value slightly above the maximum of the y-coordinates of ProjectedPoint.
Hence, one should give DBSclustering
the argument
LC
as shown in example 2.
Often it is better to mark the outliers manually after the prozess of
clustering and sometimes a clustering can be improved through human interaction
[Thrun/Ultsch,2017] <DOI:10.13140/RG.2.2.13124.53124>; use in this case the
visualization plotTopographicMap
of the
package GeneralizedUmatrix. If you would like to mark the outliers interactivly
in the visualization use the ProjectionBasedClustering package with the
function interactiveClustering()
, or for full interactive clustering
IPBC()
. The package is available on CRAN. An example is shown in case
of interactiveClustering()
function in the third example.
Value
[1:n] numerical vector of numbers defining the classification as the main output
of this cluster analysis for the n cases of data corresponding to the n
bestmatches. It has k unique numbers representing the arbitrary labels of the
clustering. You can use plotTopographicMap(Umatrix,Bestmatches,Cls)
for
verification.
Note
If you want to verifiy your clustering result externally, you can use
Heatmap
or SilhouettePlot
of the package DataVisualizations
available on CRAN.
Author(s)
Michael Thrun
References
[Thrun/Ultsch, 2021] Thrun, M. C., and Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Artificial Intelligence, Vol. 290, pp. 103237, doi:10.1016/j.artint.2020.103237, 2021.
Examples
data("Lsun3D")
Data=Lsun3D$Data
InputDistances=as.matrix(dist(Data))
projection=Pswarm(InputDistances)
## Example One
genUmatrixList=GeneratePswarmVisualization(Data,
projection$ProjectedPoints,projection$LC)
Cls=DBSclustering(k=3, Data, genUmatrixList$Bestmatches,
genUmatrixList$LC,PlotIt=TRUE)
## Example Two
#automatic Clustering without GeneralizedUmatrix visualization
Cls=DBSclustering(k=3, Data, projection$ProjectedPoints,projection$LC,
PlotIt=TRUE)
## Not run:
## Example Three
## Sometimes an automatic Clustering can be improved
## through an interactive approach,
## e.g. if Outliers exist (see [Thrun/Ultsch, 2017])
library(ProjectionBasedClustering)
Cls2=ProjectionBasedClustering::interactiveClustering(genUmatrixList$Umatrix,
genUmatrixList$Bestmatches, Cls)
## End(Not run)