stat_dots {ggdist} | R Documentation |
Dot plot (shortcut stat)
Description
A combination of stat_slabinterval()
and geom_dotsinterval()
with sensible defaults
for making dot plots. While geom_dotsinterval()
is intended for use on data
frames that have already been summarized using a point_interval()
function,
stat_dots()
is intended for use directly on data frames of draws or of
analytical distributions, and will perform the summarization using a point_interval()
function. Geoms based on geom_dotsinterval()
create dotplots that automatically determine a bin width that
ensures the plot fits within the available space. They can also ensure dots do not overlap.
Roughly equivalent to:
stat_dotsinterval( aes(size = NULL), geom = "dots", show_point = FALSE, show_interval = FALSE, show.legend = NA )
Usage
stat_dots(
mapping = NULL,
data = NULL,
geom = "dots",
position = "identity",
...,
quantiles = NA,
orientation = NA,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE
)
Arguments
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
Use to override the default connection between
|
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function.
Setting this equal to |
... |
Other arguments passed to
|
quantiles |
Setting this to a value other than |
orientation |
Whether this geom is drawn horizontally or vertically. One of:
For compatibility with the base ggplot naming scheme for |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Details
The dots family of stats and geoms are similar to geom_dotplot()
but with a number of differences:
Dots geoms act like slabs in
geom_slabinterval()
and can be given x positions (or y positions when in a horizontal orientation).Given the available space to lay out dots, the dots geoms will automatically determine how many bins to use to fit the available space.
Dots geoms use a dynamic layout algorithm that lays out dots from the center out if the input data are symmetrical, guaranteeing that symmetrical data results in a symmetrical plot. The layout algorithm also prevents dots from overlapping each other.
The shape of the dots in these geoms can be changed using the
slab_shape
aesthetic (when using thedotsinterval
family) or theshape
orslab_shape
aesthetic (when using thedots
family)
Stats and geoms in this family include:
-
geom_dots()
: dotplots on raw data. Ensures the dotplot fits within available space by reducing the size of the dots automatically (may result in very small dots). -
geom_swarm()
andgeom_weave()
: dotplots on raw data with defaults intended to create "beeswarm" plots. Usedside = "both"
by default, and sets the default dot size to the same size asgeom_point()
(binwidth = unit(1.5, "mm")
), allowing dots to overlap instead of getting very small. -
stat_dots()
: dotplots on raw data, distributional objects, andposterior::rvar()
s -
geom_dotsinterval()
: dotplot + interval plots on raw data with already-calculated intervals (rarely useful directly). -
stat_dotsinterval()
: dotplot + interval plots on raw data, distributional objects, andposterior::rvar()
s (will calculate intervals for you). -
geom_blur_dots()
: blurry dotplots that allow the standard deviation of a blur applied to each dot to be specified using thesd
aesthetic. -
stat_mcse_dots()
: blurry dotplots of quantiles using the Monte Carlo Standard Error of each quantile.
stat_dots()
and stat_dotsinterval()
, when used with the quantiles
argument,
are particularly useful for constructing quantile dotplots, which can be an effective way to communicate uncertainty
using a frequency framing that may be easier for laypeople to understand (Kay et al. 2016, Fernandes et al. 2018).
To visualize sample data, such as a data distribution, samples from a
bootstrap distribution, or a Bayesian posterior, you can supply samples to
the x
or y
aesthetic.
To visualize analytical distributions, you can use the xdist
or ydist
aesthetic. For historical reasons, you can also use dist
to specify the distribution, though
this is not recommended as it does not work as well with orientation detection.
These aesthetics can be used as follows:
-
xdist
,ydist
, anddist
can be any distribution object from the distributional package (dist_normal()
,dist_beta()
, etc) or can be aposterior::rvar()
object. Since these functions are vectorized, other columns can be passed directly to them in anaes()
specification; e.g.aes(dist = dist_normal(mu, sigma))
will work ifmu
andsigma
are columns in the input data frame. -
dist
can be a character vector giving the distribution name. Then thearg1
, ...arg9
aesthetics (orargs
as a list column) specify distribution arguments. Distribution names should correspond to R functions that have"p"
,"q"
, and"d"
functions; e.g."norm"
is a valid distribution name because R defines thepnorm()
,qnorm()
, anddnorm()
functions for Normal distributions.See the
parse_dist()
function for a useful way to generatedist
andargs
values from human-readable distribution specs (like"normal(0,1)"
). Such specs are also produced by other packages (like thebrms::get_prior
function in brms); thus,parse_dist()
combined with the stats described here can help you visualize the output of those functions.
Value
A ggplot2::Stat representing a dot geometry which can
be added to a ggplot()
object.
Computed Variables
The following variables are computed by this stat and made available for
use in aesthetic specifications (aes()
) using the after_stat()
function or the after_stat
argument of stage()
:
-
x
ory
: For slabs, the input values to the slab function. For intervals, the point summary from the interval function. Whether it isx
ory
depends onorientation
-
xmin
orymin
: For intervals, the lower end of the interval from the interval function. -
xmax
orymax
: For intervals, the upper end of the interval from the interval function. -
.width
: For intervals, the interval width as a numeric value in[0, 1]
. For slabs, the width of the smallest interval containing that value of the slab. -
level
: For intervals, the interval width as an ordered factor. For slabs, the level of the smallest interval containing that value of the slab. -
pdf
: For slabs, the probability density function (PDF). Ifoptions("ggdist.experimental.slab_data_in_intervals")
isTRUE
: For intervals, the PDF at the point summary; intervals also havepdf_min
andpdf_max
for the PDF at the lower and upper ends of the interval. -
cdf
: For slabs, the cumulative distribution function. Ifoptions("ggdist.experimental.slab_data_in_intervals")
isTRUE
: For intervals, the CDF at the point summary; intervals also havecdf_min
andcdf_max
for the CDF at the lower and upper ends of the interval. -
n
: For slabs, the number of data points summarized into that slab. If the slab was created from an analytical distribution via thexdist
,ydist
, ordist
aesthetic,n
will beInf
. -
f
: (deprecated) For slabs, the output values from the slab function (such as the PDF, CDF, or CCDF), determined byslab_type
. Instead of usingslab_type
to changef
and then mappingf
onto an aesthetic, it is now recommended to simply map the corresponding computed variable (e.g.pdf
,cdf
, or1 - cdf
) directly onto the desired aesthetic.
Aesthetics
The dots+interval stat
s and geom
s have a wide variety of aesthetics that control
the appearance of their three sub-geometries: the dots (aka the slab), the
point, and the interval.
These stat
s support the following aesthetics:
x
: x position of the geometry (when orientation ="vertical"
); or sample data to be summarized (whenorientation = "horizontal"
with sample data).y
: y position of the geometry (when orientation ="horizontal"
); or sample data to be summarized (whenorientation = "vertical"
with sample data).weight
: When using samples (i.e. thex
andy
aesthetics, notxdist
orydist
), optional weights to be applied to each draw.xdist
: When using analytical distributions, distribution to map on the x axis: a distributional object (e.g.dist_normal()
) or aposterior::rvar()
object.ydist
: When using analytical distributions, distribution to map on the y axis: a distributional object (e.g.dist_normal()
) or aposterior::rvar()
object.dist
: When using analytical distributions, a name of a distribution (e.g."norm"
), a distributional object (e.g.dist_normal()
), or aposterior::rvar()
object. See Details.args
: Distribution arguments (args
orarg1
, ...arg9
). See Details.
In addition, in their default configuration (paired with geom_dots()
)
the following aesthetics are supported by the underlying geom:
Dots-specific (aka Slab-specific) aesthetics
family
: The font family used to draw the dots.order
: The order in which data points are stacked within bins. Can be used to create the effect of "stacked" dots by ordering dots according to a discrete variable. If omitted (NULL
), the value of the data points themselves are used to determine stacking order. Only applies whenlayout
is"bin"
or"hex"
, as the other layout methods fully determine both x and y positions.side
: Which side to place the slab on."topright"
,"top"
, and"right"
are synonyms which cause the slab to be drawn on the top or the right depending on iforientation
is"horizontal"
or"vertical"
."bottomleft"
,"bottom"
, and"left"
are synonyms which cause the slab to be drawn on the bottom or the left depending on iforientation
is"horizontal"
or"vertical"
."topleft"
causes the slab to be drawn on the top or the left, and"bottomright"
causes the slab to be drawn on the bottom or the right."both"
draws the slab mirrored on both sides (as in a violin plot).scale
: What proportion of the region allocated to this geom to use to draw the slab. Ifscale = 1
, slabs that use the maximum range will just touch each other. Default is0.9
to leave some space between adjacent slabs. For a comprehensive discussion and examples of slab scaling and normalization, see thethickness
scale article.justification
: Justification of the interval relative to the slab, where0
indicates bottom/left justification and1
indicates top/right justification (depending onorientation
). Ifjustification
isNULL
(the default), then it is set automatically based on the value ofside
: whenside
is"top"
/"right"
justification
is set to0
, whenside
is"bottom"
/"left"
justification
is set to1
, and whenside
is"both"
justification
is set to 0.5.datatype
: When using composite geoms directly without astat
(e.g.geom_slabinterval()
),datatype
is used to indicate which part of the geom a row in the data targets: rows withdatatype = "slab"
target the slab portion of the geometry and rows withdatatype = "interval"
target the interval portion of the geometry. This is set automatically when using ggdiststat
s.
Interval-specific aesthetics
xmin
: Left end of the interval sub-geometry (iforientation = "horizontal"
).xmax
: Right end of the interval sub-geometry (iforientation = "horizontal"
).ymin
: Lower end of the interval sub-geometry (iforientation = "vertical"
).ymax
: Upper end of the interval sub-geometry (iforientation = "vertical"
).
Point-specific aesthetics
shape
: Shape type used to draw the point sub-geometry.
Color aesthetics
colour
: (orcolor
) The color of the interval and point sub-geometries. Use theslab_color
,interval_color
, orpoint_color
aesthetics (below) to set sub-geometry colors separately.fill
: The fill color of the slab and point sub-geometries. Use theslab_fill
orpoint_fill
aesthetics (below) to set sub-geometry colors separately.alpha
: The opacity of the slab, interval, and point sub-geometries. Use theslab_alpha
,interval_alpha
, orpoint_alpha
aesthetics (below) to set sub-geometry colors separately.colour_ramp
: (orcolor_ramp
) A secondary scale that modifies thecolor
scale to "ramp" to another color. Seescale_colour_ramp()
for examples.fill_ramp
: A secondary scale that modifies thefill
scale to "ramp" to another color. Seescale_fill_ramp()
for examples.
Line aesthetics
linewidth
: Width of the line used to draw the interval (except withgeom_slab()
: then it is the width of the slab). With composite geometries including an interval and slab, useslab_linewidth
to set the line width of the slab (see below). For interval, rawlinewidth
values are transformed according to theinterval_size_domain
andinterval_size_range
parameters of thegeom
(see above).size
: Determines the size of the point. Iflinewidth
is not provided,size
will also determines the width of the line used to draw the interval (this allows line width and point size to be modified together by setting onlysize
and notlinewidth
). Rawsize
values are transformed according to theinterval_size_domain
,interval_size_range
, andfatten_point
parameters of thegeom
(see above). Use thepoint_size
aesthetic (below) to set sub-geometry size directly without applying the effects ofinterval_size_domain
,interval_size_range
, andfatten_point
.stroke
: Width of the outline around the point sub-geometry.linetype
: Type of line (e.g.,"solid"
,"dashed"
, etc) used to draw the interval and the outline of the slab (if it is visible). Use theslab_linetype
orinterval_linetype
aesthetics (below) to set sub-geometry line types separately.
Slab-specific color and line override aesthetics
slab_fill
: Override forfill
: the fill color of the slab.slab_colour
: (orslab_color
) Override forcolour
/color
: the outline color of the slab.slab_alpha
: Override foralpha
: the opacity of the slab.slab_linewidth
: Override forlinwidth
: the width of the outline of the slab.slab_linetype
: Override forlinetype
: the line type of the outline of the slab.slab_shape
: Override forshape
: the shape of the dots used to draw the dotplot slab.
Interval-specific color and line override aesthetics
interval_colour
: (orinterval_color
) Override forcolour
/color
: the color of the interval.interval_alpha
: Override foralpha
: the opacity of the interval.interval_linetype
: Override forlinetype
: the line type of the interval.
Point-specific color and line override aesthetics
point_fill
: Override forfill
: the fill color of the point.point_colour
: (orpoint_color
) Override forcolour
/color
: the outline color of the point.point_alpha
: Override foralpha
: the opacity of the point.point_size
: Override forsize
: the size of the point.
Deprecated aesthetics
slab_size
: Useslab_linewidth
.interval_size
: Useinterval_linewidth
.
Other aesthetics (these work as in standard geom
s)
width
height
group
See examples of some of these aesthetics in action in vignette("dotsinterval")
.
Learn more about the sub-geom override aesthetics (like interval_color
) in the
scales documentation. Learn more about basic ggplot aesthetics in
vignette("ggplot2-specs")
.
References
Kay, M., Kola, T., Hullman, J. R., & Munson, S. A. (2016). When (ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. Conference on Human Factors in Computing Systems - CHI '16, 5092–5103. doi:10.1145/2858036.2858558.
Fernandes, M., Walls, L., Munson, S., Hullman, J., & Kay, M. (2018). Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making. Conference on Human Factors in Computing Systems - CHI '18. doi:10.1145/3173574.3173718.
See Also
See geom_dots()
for the geom underlying this stat.
See vignette("dotsinterval")
for a variety of examples of use.
Other dotsinterval stats:
stat_dotsinterval()
,
stat_mcse_dots()
Examples
library(dplyr)
library(ggplot2)
library(distributional)
theme_set(theme_ggdist())
# ON SAMPLE DATA
set.seed(12345)
tibble(
x = rep(1:10, 100),
y = rnorm(1000, x)
) %>%
ggplot(aes(x = x, y = y)) +
stat_dots()
# ON ANALYTICAL DISTRIBUTIONS
# Vectorized distribution types, like distributional::dist_normal()
# and posterior::rvar(), can be used with the `xdist` / `ydist` aesthetics
tibble(
x = 1:10,
sd = seq(1, 3, length.out = 10)
) %>%
ggplot(aes(x = x, ydist = dist_normal(x, sd))) +
stat_dots(quantiles = 50)