DataVisualizations-package {DataVisualizations}R Documentation

Visualizations of High-Dimensional Data

Description

Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.

Details

For a brief introduction to DataVisualizations please see the vignette A Quick Tour in Data Visualizations.

Please see https://www.deepbionics.org/. Depending on the context please cite either [Thrun, 2018] regarding visualizations in the context of clustering or [Thrun/Ultsch, 2018] for other visualizations.

For the Mirrored Density Plot (MD plot) please cite [Thrun et al., 2020] and see the extensive vignette in https://md-plot.readthedocs.io/en/latest/index.html. The MD plot is also available in Python https://pypi.org/project/md-plot/

Index of help topics:

ABCbarplot              Barplot with Sorted Data Colored by ABCanalysis
AccountingInformation_PrimeStandard_Q3_2019
                        Accounting Information in the Prime Standard in
                        Q3 in 2019 (AI_PS_Q3_2019)
BimodalityAmplitude     Bimodality Amplitude
CCDFplot                plot Complementary Cumulative Distribution
                        Function (CCDF) in Log/Log uses ecdf, CCDF(x) =
                        1-cdf(x)
ChoroplethPostalCodesAndAGS_Germany
                        Postal Codes and AGS of Germany for a
                        Choropleth Map
Choroplethmap           Plots the Choropleth Map
ClassBoxplot            Creates Boxplot plot for all classes
ClassErrorbar           ClassErrorbar
ClassMDplot             Class MDplot for Data w.r.t. all classes
ClassPDEplot            PDE Plot for all classes
ClassPDEplotMaxLikeli   Create PDE plot for all classes with maximum
                        likelihood
Classplot               Classplot
CombineCols             Combine vectors of various lengths
Crosstable              Crosstable plot
DataVisualizations-package
                        Visualizations of High-Dimensional Data
DefaultColorSequence    Default color sequence for plots
DensityContour          Contour plot of densities
DensityScatter          Scatter plot with densities
DualaxisClassplot       Dualaxis Classplot
DualaxisLinechart       DualaxisLinechart
Fanplot                 The fan plot
FundamentalData_Q1_2018
                        Fundamental Data of the 1st Quarter in 2018
GoogleMapsCoordinates   Google Maps with marked coordinates
Heatmap                 Heatmap for Clustering
HeatmapColors           Default color sequence for plots
ITS                     Income Tax Share
InspectBoxplots         Inspect Boxplots
InspectCorrelation      Inspect the Correlation
InspectDistances        Inspection of Distance-Distribution
InspectScatterplots     Pairwise scatterplots and optimal histograms
InspectStandardization
                        QQplot of Data versus Normalized Data
InspectVariable         Visualization of Distribution of one variable
JitterUniqueValues      Jitters Unique Values
Lsun3D                  Lsun3D inspired by FCPS [Thrun/Ultsch, 2020]
                        introduced in [Thrun, 2018]
MAplot                  Minus versus Add plot
MDplot                  Mirrored Density plot (MD-plot)
MDplot4multiplevectors
                        Mirrored Density plot (MD-plot)for Multiple
                        Vectors
MTY                     Muncipal Income Tax Yield
Multiplot               Plot multiple ggplots objects in one panel
OptimalNoBins           Optimal Number Of Bins
PDEplot                 PDE plot
ParetoDensityEstimation
                        Pareto Density Estimation V3
ParetoRadius            ParetoRadius for distributions
Piechart                The pie chart
Pixelmatrix             Plot of a Pixel Matrix
Plot3D                  3D plot of points
PlotGraph2D             PlotGraph2D
PlotMissingvalues       Plot of the Amount Of Missing Values
PlotProductratio        Product-Ratio Plot
PmatrixColormap         P-Matrix colors
QQplot                  QQplot with a Linear Fit
ROC                     ROC plot
RobustNorm_BackTrafo    Transforms the Robust Normalization back
RobustNormalization     RobustNormalization
ShepardDensityScatter   Shepard PDE scatter
Sheparddiagram          Draws a Shepard Diagram
SignedLog               Signed Log
Silhouetteplot          Silhouette plot of classified data.
Slopechart              Slope Chart
StatPDEdensity          Pareto Density Estimation
Worldmap                plots a world map by country codes
categoricalVariable     A categorical Feature.
estimateDensity2D       estimateDensity2D
stat_pde_density        Calculate Pareto density estimation for ggplot2
                        plots
world_country_polygons
                        world_country_polygons
zplot                   Plotting for 3 dimensional data

Author(s)

Michael Thrun, Felix Pape, Onno Hansen-Goos, Alfred Ultsch

Maintainer: Michael Thrun <m.thrun@gmx.net>

References

[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, doi:10.1007/978-3-658-20540-9, 2018.

[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.

[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.

Examples



data("Lsun3D")
Data=Lsun3D$Data

Pixelmatrix(Data)



InspectDistances(as.matrix(dist(Data)))


MAlist=MAplot(ITS,MTY)

data("Lsun3D")
Cls=Lsun3D$Cls
Data=Lsun3D$Data
#clear cluster structure
plot(Data[,1:2],col=Cls)
#However, the silhouette plot does not indicate a very good clustering in cluster 1 and 2

Silhouetteplot(Data,Cls = Cls)


Heatmap(as.matrix(dist(Data)),Cls = Cls)


[Package DataVisualizations version 1.3.2 Index]