stat_mcse_dots {ggdist}R Documentation

Blurry MCSE dot plot (stat)

Description

Variant of stat_dots() for creating blurry dotplots of quantiles. Uses posterior::mcse_quantile() to calculate the Monte Carlo Standard Error of each quantile computed for the dotplot, yielding an se computed variable that is by default mapped onto the sd aesthetic of geom_blur_dots().

Usage

stat_mcse_dots(
  mapping = NULL,
  data = NULL,
  geom = "blur_dots",
  position = "identity",
  ...,
  quantiles = NA,
  orientation = NA,
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
)

Arguments

mapping

Set of aesthetic mappings created by aes(). If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping.

data

The data to be displayed in this layer. There are three options:

If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().

A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.

A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data. A function can be created from a formula (e.g. ~ head(.x, 10)).

geom

Use to override the default connection between stat_mcse_dots() and geom_blur_dots()

position

Position adjustment, either as a string, or the result of a call to a position adjustment function. Setting this equal to "dodge" (position_dodge()) or "dodgejust" (position_dodgejust()) can be useful if you have overlapping geometries.

...

Other arguments passed to layer(). These are often aesthetics, used to set an aesthetic to a fixed value, like colour = "red" or linewidth = 3 (see Aesthetics, below). They may also be parameters to the paired geom/stat. When paired with the default geom, geom_blur_dots(), these include:

blur

Blur function to apply to dots. One of:

  • A function that takes a numeric vector of distances from the dot center, the dot radius, and the standard deviation of the blur and returns a vector of opacities in [0, 1], such as blur_gaussian() or blur_interval().

  • A string indicating what blur function to use, as the suffix to a function name starting with blur_; e.g. "gaussian" (the default) applies blur_gaussian().

binwidth

The bin width to use for laying out the dots. One of:

  • NA (the default): Dynamically select the bin width based on the size of the plot when drawn. This will pick a binwidth such that the tallest stack of dots is at most scale in height (ideally exactly scale in height, though this is not guaranteed).

  • A length-1 (scalar) numeric or unit object giving the exact bin width.

  • A length-2 (vector) numeric or unit object giving the minimum and maximum desired bin width. The bin width will be dynamically selected within these bounds.

If the value is numeric, it is assumed to be in units of data. The bin width (or its bounds) can also be specified using unit(), which may be useful if it is desired that the dots be a certain point size or a certain percentage of the width/height of the viewport. For example, unit(0.1, "npc") would make dots that are exactly 10% of the viewport size along whichever dimension the dotplot is drawn; unit(c(0, 0.1), "npc") would make dots that are at most 10% of the viewport size (while still ensuring the tallest stack is less than or equal to scale).

dotsize

The width of the dots relative to the binwidth. The default, 1.07, makes dots be just a bit wider than the bin width, which is a manually-tuned parameter that tends to work well with the default circular shape, preventing gaps between bins from appearing to be too large visually (as might arise from dots being precisely the binwidth). If it is desired to have dots be precisely the binwidth, set dotsize = 1.

stackratio

The distance between the center of the dots in the same stack relative to the dot height. The default, 1, makes dots in the same stack just touch each other.

layout

The layout method used for the dots:

  • "bin" (default): places dots on the off-axis at the midpoint of their bins as in the classic Wilkinson dotplot. This maintains the alignment of rows and columns in the dotplot. This layout is slightly different from the classic Wilkinson algorithm in that: (1) it nudges bins slightly to avoid overlapping bins and (2) if the input data are symmetrical it will return a symmetrical layout.

  • "weave": uses the same basic binning approach of "bin", but places dots in the off-axis at their actual positions (unless overlaps = "nudge", in which case overlaps may be nudged out of the way). This maintains the alignment of rows but does not align dots within columns.

  • "hex": uses the same basic binning approach of "bin", but alternates placing dots + binwidth/4 or - binwidth/4 in the off-axis from the bin center. This allows hexagonal packing by setting a stackratio less than 1 (something like 0.9 tends to work).

  • "swarm": uses the "compactswarm" layout from beeswarm::beeswarm(). Does not maintain alignment of rows or columns, but can be more compact and neat looking, especially for sample data (as opposed to quantile dotplots of theoretical distributions, which may look better with "bin", "weave", or "hex").

  • "bar": for discrete distributions, lays out duplicate values in rectangular bars.

overlaps

How to handle overlapping dots or bins in the "bin", "weave", and "hex" layouts (dots never overlap in the "swarm" or "bar" layouts). For the purposes of this argument, dots are only considered to be overlapping if they would be overlapping when dotsize = 1 and stackratio = 1; i.e. if you set those arguments to other values, overlaps may still occur. One of:

  • "keep": leave overlapping dots as they are. Dots may overlap (usually only slightly) in the "bin", "weave", and "hex" layouts.

  • "nudge": nudge overlapping dots out of the way. Overlaps are avoided using a constrained optimization which minimizes the squared distance of dots to their desired positions, subject to the constraint that adjacent dots do not overlap.

smooth

Smoother to apply to dot positions. One of:

  • A function that takes a numeric vector of dot positions and returns a smoothed version of that vector, such as smooth_bounded(), smooth_unbounded(), smooth_discrete()⁠, or ⁠smooth_bar()'.

  • A string indicating what smoother to use, as the suffix to a function name starting with smooth_; e.g. "none" (the default) applies smooth_none(), which simply returns the given vector without applying smoothing.

Smoothing is most effective when the smoother is matched to the support of the distribution; e.g. using smooth_bounded(bounds = ...).

overflow

How to handle overflow of dots beyond the extent of the geom when a minimum binwidth (or an exact binwidth) is supplied. One of:

  • "keep": Keep the overflow, drawing dots outside the geom bounds.

  • "warn": Keep the overflow, but produce a warning suggesting solutions, such as setting binwidth = NA or overflow = "compress".

  • "compress": Compress the layout. Reduces the binwidth to the size necessary to keep the dots within bounds, then adjusts stackratio and dotsize so that the apparent dot size is the user-specified minimum binwidth times the user-specified dotsize.

If you find the default layout has dots that are too small, and you are okay with dots overlapping, consider setting overflow = "compress" and supplying an exact or minimum dot size using binwidth.

verbose

If TRUE, print out the bin width of the dotplot. Can be useful if you want to start from an automatically-selected bin width and then adjust it manually. Bin width is printed both as data units and as normalized parent coordinates or "npc"s (see unit()). Note that if you just want to scale the selected bin width to fit within a desired area, it is probably easier to use scale than to copy and scale binwidth manually, and if you just want to provide constraints on the bin width, you can pass a length-2 vector to binwidth.

subguide

Sub-guide used to annotate the thickness scale. One of:

  • A function that takes a scale argument giving a ggplot2::Scale object and an orientation argument giving the orientation of the geometry and then returns a grid::grob that will draw the axis annotation, such as subguide_axis() (to draw a traditional axis) or subguide_none() (to draw no annotation). See subguide_axis() for a list of possibilities and examples.

  • A string giving the name of such a function when prefixed with "subguide"; e.g. "axis" or "none".

quantiles

Setting this to a value other than NA will produce a quantile dotplot: that is, a dotplot of quantiles from the sample or distribution (for analytical distributions, the default of NA is taken to mean 100 quantiles). The value of quantiles determines the number of quantiles to plot. See Kay et al. (2016) and Fernandes et al. (2018) for more information on quantile dotplots.

orientation

Whether this geom is drawn horizontally or vertically. One of:

  • NA (default): automatically detect the orientation based on how the aesthetics are assigned. Automatic detection works most of the time.

  • "horizontal" (or "y"): draw horizontally, using the y aesthetic to identify different groups. For each group, uses the x, xmin, xmax, and thickness aesthetics to draw points, intervals, and slabs.

  • "vertical" (or "x"): draw vertically, using the x aesthetic to identify different groups. For each group, uses the y, ymin, ymax, and thickness aesthetics to draw points, intervals, and slabs.

For compatibility with the base ggplot naming scheme for orientation, "x" can be used as an alias for "vertical" and "y" as an alias for "horizontal" (ggdist had an orientation parameter before base ggplot did, hence the discrepancy).

na.rm

If FALSE, the default, missing values are removed with a warning. If TRUE, missing values are silently removed.

show.legend

logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes. It can also be a named logical vector to finely select the aesthetics to display.

inherit.aes

If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders().

Details

The dots family of stats and geoms are similar to geom_dotplot() but with a number of differences:

Stats and geoms in this family include:

stat_dots() and stat_dotsinterval(), when used with the quantiles argument, are particularly useful for constructing quantile dotplots, which can be an effective way to communicate uncertainty using a frequency framing that may be easier for laypeople to understand (Kay et al. 2016, Fernandes et al. 2018).

To visualize sample data, such as a data distribution, samples from a bootstrap distribution, or a Bayesian posterior, you can supply samples to the x or y aesthetic.

To visualize analytical distributions, you can use the xdist or ydist aesthetic. For historical reasons, you can also use dist to specify the distribution, though this is not recommended as it does not work as well with orientation detection. These aesthetics can be used as follows:

Value

A ggplot2::Stat representing a blurry MCSE dot geometry which can be added to a ggplot() object.

Computed Variables

The following variables are computed by this stat and made available for use in aesthetic specifications (aes()) using the after_stat() function or the after_stat argument of stage():

Aesthetics

The dots+interval stats and geoms have a wide variety of aesthetics that control the appearance of their three sub-geometries: the dots (aka the slab), the point, and the interval.

These stats support the following aesthetics:

In addition, in their default configuration (paired with geom_blur_dots()) the following aesthetics are supported by the underlying geom:

Dots-specific (aka Slab-specific) aesthetics

Interval-specific aesthetics

Color aesthetics

Line aesthetics

Slab-specific color and line override aesthetics

Interval-specific color and line override aesthetics

Point-specific color and line override aesthetics

Deprecated aesthetics

Other aesthetics (these work as in standard geoms)

See examples of some of these aesthetics in action in vignette("dotsinterval"). Learn more about the sub-geom override aesthetics (like interval_color) in the scales documentation. Learn more about basic ggplot aesthetics in vignette("ggplot2-specs").

References

Kay, M., Kola, T., Hullman, J. R., & Munson, S. A. (2016). When (ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. Conference on Human Factors in Computing Systems - CHI '16, 5092–5103. doi:10.1145/2858036.2858558.

Fernandes, M., Walls, L., Munson, S., Hullman, J., & Kay, M. (2018). Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making. Conference on Human Factors in Computing Systems - CHI '18. doi:10.1145/3173574.3173718.

See Also

See geom_blur_dots() for the geom underlying this stat. See vignette("dotsinterval") for a variety of examples of use.

Other dotsinterval stats: stat_dots(), stat_dotsinterval()

Examples


library(dplyr)
library(ggplot2)

theme_set(theme_ggdist())

set.seed(1234)
data.frame(x = rnorm(1000)) %>%
  ggplot(aes(x = x)) +
  stat_mcse_dots(quantiles = 100, layout = "weave")


[Package ggdist version 3.3.2 Index]