DensityScatter {DataVisualizations}R Documentation

Scatter plot with densities

Description

Density estimation is performed by (PDE) [Ultsch, 2005] or "SDH" [Eilers/Goeman, 2004] and visualized in a density scatter plot [Brinkmann et al., 2023] in which the points are colored by their density.

Usage

DensityScatter(X,Y,DensityEstimation="SDH",

Type="DDCAL", Plotter = "native",Marginals = FALSE,

SampleSize,na.rm=FALSE, xlab, ylab, 

main="DensityScatter", AddString2lab="",
                        
xlim, ylim,NoBinsOrPareto=NULL,...)

Arguments

X

Numeric vector [1:n], first feature (for x axis values)

Y

Numeric vector [1:n], second feature (for y axis values)

DensityEstimation

(Optional), "SDH" is very fast but maybe not correct, "PDE" is slow but proably more correct, third alternativ is the typical R density estimation with "kde2d" which is sensitive to parameters

Type

(Optional), "DDCAL" uses a new density to point color matching by DDCAL algorithm [Lux/Rinderle-Ma, 2023] , "native" uses a simple density to point color matching

Plotter

in case of Type="DDCAL", (Optional) String, name of the plotting backend to use. Possible values are: "native","plotly", or "ggplot2"

Marginals

(Optional) Boolean, if TRUE the marginal distributions of X and Y will be plotted together with the 2D density of X and Y. Default is FALSE

SampleSize

(Optional), Numeric, positiv scalar, maximum size of the sample used for calculation. High values increase runtime significantly. The default is that no sample is drawn

na.rm

(Optional), Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE

xlab

(Optional), String, title of the x axis. Default: "X", see plot() function

ylab

(Optional), String, title of the y axis. Default: "Y", see plot() function

main

(Optional), string, the same as "main" in plot() function

AddString2lab

(Optional), adds the same string of information to x and y axis label, e.g. usefull for adding SI units

xlim

(Optional), in case of Type="natuive", see plot() function

ylim

in case of Type="natuive", see plot() function

NoBinsOrPareto

(Optional), in case of Type="natuive", Density specifc parameters, for PDEscatter(ParetoRadius) or SDH (nbins)) or kde2d(bins)

...

(Optional), further arguments either to ScatterDenstiy::DensityScatter.DDCAL or to plot()

Details

The DensityScatter function generates the density of the xy data as a z coordinate. Afterwards xy points will be plotted as a scatter plot, where the z values defines the coloring of the xy points. It assumens that the cases of x and y are mapped to each other meaning that a cbind(x,y) operation is allowed. This function plots the Density on top of a scatterplot. Variances of x and y should not differ by extreme numbers, otherwise calculate the percentiles on both first.

Value

List of:

X

Numeric vector [1:m],m<=n, first feature used in the plot or the kernels used

Y

Numeric vector [1:m],m<=n, second feature used in the plot or the kernels used

Densities

Number of points within the ParetoRadius of each point, i.e. density information

Note

MT contributed with several adjustments

Author(s)

Felix Pape

References

[Thrun, 2018] Thrun, M. C.: Projection Based Clustering through Self-Organization and Swarm Intelligence, (Ultsch, A. & Huellermeier, E. Eds., 10.1007/978-3-658-20540-9), Doctoral dissertation, Heidelberg, Springer, ISBN: 978-3658205393, 2018.

[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.

[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.

[Eilers/Goeman, 2004] Eilers, P. H., & Goeman, J. J.: Enhancing scatterplots with smoothed densities, Bioinformatics, Vol. 20(5), pp. 623-628. 2004

[Lux/Rinderle-Ma, 2023] Lux, M. & Rinderle-Ma, S.: DDCAL: Evenly Distributing Data into Low Variance Clusters Based on Iterative Feature Scaling, Journal of Classification vol. 40, pp. 106-144, 2023.

[Brinkmann et al., 2023] Brinkmann, L., Stier, Q., & Thrun, M. C.: Computing Sensitive Color Transitions for the Identification of Two-Dimensional Structures, Proc. Data Science, Statistics & Visualisation (DSSV) and the European Conference on Data Analysis (ECDA), p.109, Antwerp, Belgium, July 5-7, 2023.

Examples

#taken from [Thrun/Ultsch, 2018]
data("ITS")
data("MTY")
Inds=which(ITS<900&MTY<8000)
plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot')

DensityScatter(ITS[Inds],MTY[Inds],DensityEstimation="SDH",xlab = 'ITS in EUR',

ylab ='MTY in EUR' ,main='Smoothed Densities histogram indicates Bimodality' )

DensityScatter(ITS[Inds],MTY[Inds],DensityEstimation="PDE",xlab = 'ITS in EUR',

ylab ='MTY in EUR' ,main='PDE indicates Bimodality' )



[Package DataVisualizations version 1.3.2 Index]