geom_pcp {ggpcp} | R Documentation |
Generalized Parallel Coordinate plots
Description
The ggpcp
package for generalized parallel coordinate plots is implemented as a
ggplot2
extension.
In particular, this implementation makes use of ggplot2
's layer framework,
allowing for a lot of flexibility in the choice and order of showing graphical elements.
command | graphical element |
geom_pcp | line segments |
geom_pcp_axes | vertical lines to represent all axes |
geom_pcp_box | boxes for levels on categorical axes |
geom_pcp_labels | labels for levels on categorical axes |
These ggpcp
specific layers can be mixed with ggplot2
's regular geoms,
such as e.g. ggplot2::geom_point()
, ggplot2::geom_boxplot()
, ggdensity::geom_hdr()
, etc.
Usage
geom_pcp(
mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
axiswidth = c(0, 0.1),
overplot = "small-on-top",
show.legend = NA,
inherit.aes = TRUE,
...
)
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
The statistical transformation to use on the data for this
layer, either as a |
position |
Position adjustment, either as a string naming the adjustment
(e.g. |
na.rm |
If |
axiswidth |
vector of two values indicating the space numeric and categorical axes are supposed to take. Minimum of 0, maximum of 1.Defaults to 0 for a numeric axis and 0.1 for a categorical axis. |
overplot |
character value indicating which method should be used to mitigate overplotting of lines. Defaults to 'small-on-top'. The overplotting strategy 'small-on-top' identifies the number observations for each combination of levels between two categorical variables and plots the lines from highest frequency to smallest (effectively plotting small groups on top). The strategy 'none' gives most flexibility to the user - the plotting order is preserved by the order in which observations are included in the original data. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
... |
other arguments passed on to |
Value
a list consisting of a ggplot2::layer()
object and its associated scales.
About Parallel Coordinate Plots
Parallel coordinate plots are a multivariate visualization that allows several aspects of an observed entity to be shown in a single plot. Each aspect is represented by a vertical axis (giving the plot its name), values are marked on each of these axes. Values corresponding to the same entity are connected by line segments between adjacent axes. This type of visualization was first used by d’Ocagne (1985). Modern re-inventions go back to Inselberg (1985) and Wegman (1990). This implementation takes a more general approach in that it is also able to deal with categorical in the same principled way that allows a tracking of individual observations across multiple dimensions.
Data wrangling
The data pipeline feeding geom_pcp
is implemented in a three-step modularized
form rather than in a stat_pcp
function more typical for ggplot2
extensions.
The three steps of data pre-processing are:
command | data processing step |
pcp_select | variable selection (and horizontal ordering) |
pcp_scale | (vertical) scaling of values |
pcp_arrange | dealing with tie-breaks on categorical axes |
Note that these data processing steps are executed before the call to ggplot2
and the identity function is used by default in all of the ggpcp
specific layers.
Besides the speed-up by only executing the processing steps once for all layers,
the separation has the additional benefit, that it provides the users with the
possibility to make specific choices at each step in the process. Additionally,
separation allows for a cleaner user interface: parameters affecting the data
preparation process can be moved to the relevant (set of) function(s) only, thereby
reducing the number of arguments without any loss of functionality.
References
M. d’Ocagne. (1885) Coordonnées parallèles et axiales: Méthode de transformation géométrique et procédé nouveau de calcul graphique déduits de la considération des coordonnées parallèles. Gauthier-Villars, page 112, https://archive.org/details/coordonnesparal00ocaggoog/page/n10.
Al Inselberg. (1985) The plane with parallel coordinates. The Visual Computer, 1(2):69–91, doi:10.1007/BF01898350.
Ed J. Wegman. (1990) Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association, 85:664–675, doi:10.2307/2290001.
Examples
library(ggplot2)
data(mtcars)
mtcars_pcp <- mtcars |>
dplyr::mutate(
cyl = factor(cyl),
vs = factor(vs),
am = factor(am),
gear = factor(gear),
carb = factor(carb)
) |>
pcp_select(1:11) |> # select everything
pcp_scale() |>
pcp_arrange()
base <- mtcars_pcp |> ggplot(aes_pcp())
# Just the base plot:
base + geom_pcp()
# with the pcp theme
base + geom_pcp() + theme_pcp()
# with boxplots:
base +
geom_pcp(aes(colour = cyl)) +
geom_boxplot(aes(x = pcp_x, y = pcp_y),
inherit.aes=FALSE,
data = dplyr::filter(mtcars_pcp, pcp_class!="factor")) +
theme_pcp()
# base plot with boxes and labels
base +
geom_pcp(aes(colour = cyl)) +
geom_pcp_boxes() +
geom_pcp_labels() +
theme_pcp()