DataVisualizations-package {DataVisualizations} | R Documentation |
Visualizations of High-Dimensional Data
Description
Gives access to data visualisation methods that are relevant from the data scientist's point of view. The flagship idea of 'DataVisualizations' is the mirrored density plot (MD-plot) for either classified or non-classified multivariate data published in Thrun, M.C. et al.: "Analyzing the Fine Structure of Distributions" (2020), PLoS ONE, <DOI:10.1371/journal.pone.0238835>. The MD-plot outperforms the box-and-whisker diagram (box plot), violin plot and bean plot and geom_violin plot of ggplot2. Furthermore, a collection of various visualization methods for univariate data is provided. In the case of exploratory data analysis, 'DataVisualizations' makes it possible to inspect the distribution of each feature of a dataset visually through a combination of four methods. One of these methods is the Pareto density estimation (PDE) of the probability density function (pdf). Additionally, visualizations of the distribution of distances using PDE, the scatter-density plot using PDE for two variables as well as the Shepard density plot and the Bland-Altman plot are presented here. Pertaining to classified high-dimensional data, a number of visualizations are described, such as f.ex. the heat map and silhouette plot. A political map of the world or Germany can be visualized with the additional information defined by a classification of countries or regions. By extending the political map further, an uncomplicated function for a Choropleth map can be used which is useful for measurements across a geographic area. For categorical features, the Pie charts, slope charts and fan plots, improved by the ABC analysis, become usable. More detailed explanations are found in the book by Thrun, M.C.: "Projection-Based Clustering through Self-Organization and Swarm Intelligence" (2018) <DOI:10.1007/978-3-658-20540-9>.
Details
For a brief introduction to DataVisualizations please see the vignette A Quick Tour in Data Visualizations.
Please see https://www.deepbionics.org/. Depending on the context please cite either [Thrun, 2018] regarding visualizations in the context of clustering or [Thrun/Ultsch, 2018] for other visualizations.
For the Mirrored Density Plot (MD plot) please cite [Thrun et al., 2020] and see the extensive vignette in https://md-plot.readthedocs.io/en/latest/index.html. The MD plot is also available in Python https://pypi.org/project/md-plot/
Index of help topics:
ABCbarplot Barplot with Sorted Data Colored by ABCanalysis AccountingInformation_PrimeStandard_Q3_2019 Accounting Information in the Prime Standard in Q3 in 2019 (AI_PS_Q3_2019) BimodalityAmplitude Bimodality Amplitude CCDFplot plot Complementary Cumulative Distribution Function (CCDF) in Log/Log uses ecdf, CCDF(x) = 1-cdf(x) ChoroplethPostalCodesAndAGS_Germany Postal Codes and AGS of Germany for a Choropleth Map Choroplethmap Plots the Choropleth Map ClassBoxplot Creates Boxplot plot for all classes ClassErrorbar ClassErrorbar ClassMDplot Class MDplot for Data w.r.t. all classes ClassPDEplot PDE Plot for all classes ClassPDEplotMaxLikeli Create PDE plot for all classes with maximum likelihood Classplot Classplot CombineCols Combine vectors of various lengths Crosstable Crosstable plot DataVisualizations-package Visualizations of High-Dimensional Data DefaultColorSequence Default color sequence for plots DensityContour Contour plot of densities DensityScatter Scatter plot with densities DualaxisClassplot Dualaxis Classplot DualaxisLinechart DualaxisLinechart Fanplot The fan plot FundamentalData_Q1_2018 Fundamental Data of the 1st Quarter in 2018 GoogleMapsCoordinates Google Maps with marked coordinates Heatmap Heatmap for Clustering HeatmapColors Default color sequence for plots ITS Income Tax Share InspectBoxplots Inspect Boxplots InspectCorrelation Inspect the Correlation InspectDistances Inspection of Distance-Distribution InspectScatterplots Pairwise scatterplots and optimal histograms InspectStandardization QQplot of Data versus Normalized Data InspectVariable Visualization of Distribution of one variable JitterUniqueValues Jitters Unique Values Lsun3D Lsun3D inspired by FCPS [Thrun/Ultsch, 2020] introduced in [Thrun, 2018] MAplot Minus versus Add plot MDplot Mirrored Density plot (MD-plot) MDplot4multiplevectors Mirrored Density plot (MD-plot)for Multiple Vectors MTY Muncipal Income Tax Yield Multiplot Plot multiple ggplots objects in one panel OptimalNoBins Optimal Number Of Bins PDEplot PDE plot ParetoDensityEstimation Pareto Density Estimation V3 ParetoRadius ParetoRadius for distributions Piechart The pie chart Pixelmatrix Plot of a Pixel Matrix Plot3D 3D plot of points PlotGraph2D PlotGraph2D PlotMissingvalues Plot of the Amount Of Missing Values PlotProductratio Product-Ratio Plot PmatrixColormap P-Matrix colors QQplot QQplot with a Linear Fit ROC ROC plot RobustNorm_BackTrafo Transforms the Robust Normalization back RobustNormalization RobustNormalization ShepardDensityScatter Shepard PDE scatter Sheparddiagram Draws a Shepard Diagram SignedLog Signed Log Silhouetteplot Silhouette plot of classified data. Slopechart Slope Chart StatPDEdensity Pareto Density Estimation Worldmap plots a world map by country codes categoricalVariable A categorical Feature. estimateDensity2D estimateDensity2D stat_pde_density Calculate Pareto density estimation for ggplot2 plots world_country_polygons world_country_polygons zplot Plotting for 3 dimensional data
Author(s)
Michael Thrun, Felix Pape, Onno Hansen-Goos, Alfred Ultsch
Maintainer: Michael Thrun <m.thrun@gmx.net>
References
[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, doctoral dissertation 2017, Springer, Heidelberg, ISBN: 978-3-658-20539-3, doi:10.1007/978-3-658-20540-9, 2018.
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI 10.1371/journal.pone.0238835, 2020.
Examples
data("Lsun3D")
Data=Lsun3D$Data
Pixelmatrix(Data)
InspectDistances(as.matrix(dist(Data)))
MAlist=MAplot(ITS,MTY)
data("Lsun3D")
Cls=Lsun3D$Cls
Data=Lsun3D$Data
#clear cluster structure
plot(Data[,1:2],col=Cls)
#However, the silhouette plot does not indicate a very good clustering in cluster 1 and 2
Silhouetteplot(Data,Cls = Cls)
Heatmap(as.matrix(dist(Data)),Cls = Cls)