PDEscatter {ScatterDensity} | R Documentation |
Scatter Density Plot
Description
Concept of Pareto density estimation (PDE) proposed for univsariate data by [Ultsch, 2005] and comparet to varius density estimation techniques by [Thrun et al., 2020] for univariate data is here applied for a scatter density plot. It was also applied in [Thrun and Ultsch, 2018] to bivariate data, but is not yet compared to other techniques.
Usage
PDEscatter(x,y,SampleSize,
na.rm=FALSE,PlotIt=TRUE,ParetoRadius,sampleParetoRadius,
NrOfContourLines=20,Plotter='native', DrawTopView = TRUE,
xlab="X", ylab="Y", main="PDEscatter",
xlim, ylim, Legendlab_ggplot="value")
Arguments
x |
Numeric vector [1:n], first feature (for x axis values) |
y |
Numeric vector [1:n], second feature (for y axis values) |
SampleSize |
Numeric m, positiv scalar, maximum size of the sample used for calculation. High values increase runtime significantly. The default is that no sample is drawn |
na.rm |
Function may not work with non finite values. If these cases should be automatically removed, set parameter TRUE |
ParetoRadius |
Numeric, positiv scalar, the Pareto Radius. If omitted (or 0), calculate by paretoRad. |
sampleParetoRadius |
Numeric, positiv scalar, maximum size of the sample used for estimation of "kernel", should be significantly lower than SampleSize because requires distance computations which is memory expensive |
PlotIt |
|
NrOfContourLines |
Numeric, number of contour lines to be drawn. 20 by default. |
Plotter |
String, name of the plotting backend to use. Possible values are: " |
DrawTopView |
Boolean, True means contur is drawn, otherwise a 3D plot is drawn. Default: TRUE |
xlab |
String, title of the x axis. Default: "X", see |
ylab |
String, title of the y axis. Default: "Y", see |
main |
string, the same as "main" in |
xlim |
see |
ylim |
see |
Legendlab_ggplot |
String, in case of |
Details
The PDEscatter
function generates the density of the xy data as a z coordinate. Afterwards xyz will be plotted either as a contour plot or a 3d plot. It assumens that the cases of x and y are mapped to each other meaning that a cbind(x,y)
operation is allowed.
This function plots the PDE on top of a scatterplot. Variances of x and y should not differ by extreme numbers, otherwise calculate the percentiles on both first. If DrawTopView=FALSE
only the plotly option is currently available. If another option is chosen, the method switches automatically there.
The method was succesfully used in [Thrun, 2018; Thrun/Ultsch 2018].
PlotIt=FALSE
is usefull if one likes to perform adjustements like axis scaling prior to plotting with ggplot2 or plotly. In the case of "native
"" the handle returns NULL
because the basic R functon plot
() is used
Value
List of:
X |
Numeric vector [1:m],m<=n, first feature used in the plot or the kernels used |
Y |
Numeric vector [1:m],m<=n, second feature used in the plot or the kernels used |
Densities |
Numeric vector [1:m],m<=n, Number of points within the ParetoRadius of each point, i.e. density information |
Matrix3D |
1:n,1:3] marix of x,y and density information |
ParetoRadius |
ParetoRadius used for PDEscatter |
Handle |
Handle of the plot object. Information-string if native R plot is used. |
Note
MT contributed with several adjustments
Author(s)
Felix Pape
References
[Thrun/Ultsch, 2018] Thrun, M. C., & Ultsch, A. : Effects of the payout system of income taxes to municipalities in Germany, in Papiez, M. & Smiech,, S. (eds.), Proc. 12th Professor Aleksander Zelias International Conference on Modelling and Forecasting of Socio-Economic Phenomena, pp. 533-542, Cracow: Foundation of the Cracow University of Economics, Cracow, Poland, 2018.
[Ultsch, 2005] Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, In Baier, D. & Werrnecke, K. D. (Eds.), Innovations in classification, data science, and information systems, (Vol. 27, pp. 91-100), Berlin, Germany, Springer, 2005.
[Thrun et al., 2020] Thrun, M. C., Gehlert, T. & Ultsch, A.: Analyzing the Fine Structure of Distributions, PLoS ONE, Vol. 15(10), pp. 1-66, DOI doi:10.1371/journal.pone.0238835, 2020.
Examples
#taken from [Thrun/Ultsch, 2018]
if(requireNamespace("DataVisualizations")){
data("ITS",package = "DataVisualizations")
data("MTY",package = "DataVisualizations")
Inds=which(ITS<900&MTY<8000)
plot(ITS[Inds],MTY[Inds],main='Bimodality is not visible in normal scatter plot')
PDEscatter(ITS[Inds],MTY[Inds],xlab = 'ITS in EUR',
ylab ='MTY in EUR' ,main='Pareto Density Estimation indicates Bimodality' )
}