DBSclusteringAndVisualization {FCPS}R Documentation

Databionic Swarm (DBS) Clustering and Visualization

Description

Swarm-based clustering by exploting self-organization, emergence, swarm intelligence and game theory published in [Thrun/Ultsch, 2021].

Usage

DatabionicSwarmClustering(DataOrDistances, ClusterNo = 0,

StructureType = TRUE, DistancesMethod = NULL,

PlotTree = FALSE, PlotMap = FALSE,PlotIt=FALSE, 

Parallel = FALSE)

Arguments

DataOrDistances

Either nonsymmetric [1:n,1:d] numerical matrix of a dataset to be clustered. It consists of n cases of d-dimensional data points. Every case has d attributes, variables or features.

or

symmetric [1:n,1:n] distance matrix, e.g. as.matrix(dist(Data,method))

ClusterNo

Number of Clusters, if zero a the topographic map is ploted. Number of valleys equals number of clusters.

StructureType

Either TRUE or FALSE, has to be tested against the visualization. If colored points of clusters a divided by mountain ranges, parameter is incorrect.

DistancesMethod

Optional, if data matrix given, annon Euclidean distance can be selected

PlotTree

Optional, if TRUE: dendrogram is plotted.

PlotMap

Optional, if TRUE: topographic map is plotted if GeneralizedUmatrix is installed. See details.

PlotIt

Default: FALSE, If TRUE and dataset of [1:n,1:d] dimensions then a plot of the first three dimensions of the dataset with colored three-dimensional data points defined by the clustering stored in Cls will be generated.

Parallel

FALSE: default implementatiomn, TRUE faster Cpp parallel implementation, for this the subsequent packages have to be installed from github, as they are not available on CRAN yet.

Details

This function does not enable the user first to project the data and then to test the Boolean parameter defining the type of structure contrary to the DatabionicSwarm which is an inappropriate approach in case of exploratory data analysis.

Instead, this function is implemented for the purpose of automatic benchmarking because in such a case nobody will investigate many trials with one visualization per trial.

If one would like to perform a clustering exploratively (in the sense that a prior clustering is not given for evaluation purposes), then please use the DatabionicSwarm package directly and read the vignette there. Databionic swarm is like k-means a stochastic algorithm meaning that the clustering and visualization may change between trials.

If PlotMap==TRUE and ClusterNo=0 a topview of the topographic map is shown, in which the points are not labeled, i.e. colored by the same color. If PlotMap==TRUE and ClusterNo>0, then the points are colored by their cluster labels. If you would like to look an 3D topogrpahic map that can be interactively rotated or use 3D printing of the high-dimensional structures [Thrun et al., 2016], please see plotTopographicMap for further details.

Value

List of

Cls

1:n numerical vector of numbers defining the classification as the main output of the clustering algorithm for the n cases of data. It has k unique numbers representing the arbitrary labels of the clustering.

Object

List of further output of DBS

Note

Current implementation is not efficient enough to cluster more than N=4000 cases as in that case it takes longer than a day for a result.

Author(s)

Michael Thrun

References

[Thrun/Ultsch, 2021] Thrun, M. C., and Ultsch, A.: Swarm Intelligence for Self-Organized Clustering, Artificial Intelligence, Vol. 290, pp. 103237, doi:10.1016/j.artint.2020.103237, 2021.

[Thrun/Ultsch, 2021] Thrun, M. C., & Ultsch, A.: Swarm Intelligence for Self-Organized Clustering (Extended Abstract), in Bessiere, C. (Ed.), 29th International Joint Conference on Artificial Intelligence (IJCAI), Vol. IJCAI-20, pp. 5125–5129, doi:10.24963/ijcai.2020/720, Yokohama, Japan, Jan., 2021.

[Thrun et al., 2016] Thrun, M. C., Lerch, F., Lötsch, J., & Ultsch, A. : Visualization and 3D Printing of Multivariate Data of Biomarkers, in Skala, V. (Ed.), International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), Vol. 24, Plzen, 2016.

See Also

Pswarm, DBSclustering,GeneratePswarmVisualization

Examples


# Generate random but small non-structured data set
data = cbind(
  sample(1:100, 300, replace = TRUE),
  sample(1:100, 300, replace = TRUE),
  sample(1:100, 300, replace = TRUE)
)
# Make sure there are no structures
# (sample size is small and still could generate structures randomly)
if(requireNamespace('DataVisualizations',quietly = TRUE)){
Data = DataVisualizations::RobustNormalization(data, Centered = TRUE)
#DataVisualizations::Plot3D(Data)

# No structres are visible
# Topographic map looks like "egg carton"
# with every point in its own valley
ClsV = DatabionicSwarmClustering(Data, 0, PlotMap = TRUE)
}else{
# only for testing purposes of CRAN!
# in case CRAN tests with no suggest packages available
# please use alpways some kind of standardization!
ClsV = DatabionicSwarmClustering(data, 0, PlotMap = TRUE)
}


# Distance based cluster structures
# 7 valleys are visible, thus ClusterNo=7

data(Hepta)
#DataVisualizations::Plot3D(Hepta$Data)

ClsV = DatabionicSwarmClustering(Hepta$Data, 0, PlotMap = TRUE)


#entagled, complex, and non-linear seperable structures 
## Not run: 
#takes too long for CRAN tests
data(Chainlink)
#DataVisualizations::Plot3D(Chainlink$Data)

# 2 valleys are visible, thus ClusterNo=2
ClsV = DatabionicSwarmClustering(Chainlink$Data, 0, PlotMap = TRUE)

# Experiment with parameter StructureType only
# reveals that clustering is appropriate
# if StructureType=FALSE
ClsV2 = DatabionicSwarmClustering(Chainlink$Data,
                                2,
                                StructureType = FALSE,
                                PlotMap = TRUE)

# Here clusters (colored points)
# are not seperated by valleys
ClsV = DatabionicSwarmClustering(Chainlink$Data,
                                2,
                                StructureType = TRUE,
                                PlotMap = TRUE)

## End(Not run)


[Package FCPS version 1.3.4 Index]