pp_plot {esvis}R Documentation

Produces the paired probability plot for two groups

Description

The paired probability plot maps the probability of obtaining a specific score for each of two groups. The area under the curve (auc) corresponds to the probability that a randomly selected observation from the x-axis group will have a higher score than a randomly selected observation from the y-axis group. This function extends the basic pp-plot by allowing multiple curves and faceting to facilitate a variety of comparisons. Note that because the plotting is built on top of ggplot2, additional customization can be made on top of the plots, as illustrated in the examples.

Usage

pp_plot(
  data,
  formula,
  ref_group = NULL,
  cuts = NULL,
  cut_labels = TRUE,
  cut_label_x = 0.02,
  cut_label_size = 3,
  lines = TRUE,
  linetype = "solid",
  linewidth = 1.1,
  shade = TRUE,
  shade_alpha = 0.2,
  refline = TRUE,
  refline_col = "gray40",
  refline_type = "dashed",
  refline_width = 1.1
)

Arguments

data

The data frame to be plotted

formula

A formula of the type out ~ group where out is the outcome variable and group is the grouping variable. Note this variable can include any arbitrary number of groups. Additional variables can be included with + to produce separate plots by the secondary or tertiary variable of interest (e.g., out ~ group + characteristic1 + characteristic2). No more than two additional characteristics can be supplied at this time.

ref_group

Optional character vector (of length 1) naming the reference group. Defaults to the group with the highest mean score.

cuts

Integer. Optional vector (or single number) of scores used to annotate the plot. If supplied, line segments will extend from the corresponding x and y axes and meet at the PP curve.

cut_labels

Logical. Should the reference lines corresponding to cuts be labeled? Defaults to TRUE.

cut_label_x

The x-axis location of the cut labels. Defaults to 0.02.

cut_label_size

The size of the cut labels. Defaults to 3.

lines

Logical. Should the PP Lines be plotted? Defaults to TRUE.

linetype

The linetype for the PP lines. Defaults to "solid".

linewidth

The width of the PP lines. Defaults to 1.1 (just marginally larger than the default ggplot2 lines).

shade

Logical. Should the area under the curve be shaded? Defaults to TRUE.

shade_alpha

Transparency of the shading. Defaults to 0.2.

refline

Logical. Should a diagonal reference line be plotted, representing the value at which no difference is observed between the reference and focal distributions? Defaults to TRUE.

refline_col

Color of the reference line. Defaults to a dark gray.

refline_type

The linetype for the reference line. Defaults to "dashed".

refline_width

The width of the reference line. Defaults to 1, or just slightly thinner than the PP lines.

Value

A ggplot2 object displaying the specified PP plot.

Examples

# PP plot examining differences by condition
pp_plot(star, math ~ condition)

# The sample size gets very small in the above within cells (e.g., wild 
# changes within the "other" group in particular). Overall, the effect doesn't
# seem to change much by condition.

# Look at something a little more interesting
## Not run: 
pp_plot(benchmarks, math ~ ell + season + frl)

## End(Not run)
# Add some cut scores
pp_plot(benchmarks, math ~ ell, cuts = c(190, 210, 215))

## Make another interesting plot. Use ggplot to customize
## Not run: 
library(tidyr)
library(ggplot2)
benchmarks %>% 
  gather(subject, score, reading, math) %>% 
  pp_plot(score ~ ell + subject + season,
          ref_group = "Non-ELL") +
  scale_fill_brewer(name = "ELL Status", palette = "Pastel2") +
  scale_color_brewer(name = "ELL Status", palette = "Pastel2") +
  labs(title = "Differences among English Language Learning Groups",
       subtitle = "Note crossing of reference line") +
  theme_minimal()

## End(Not run)


[Package esvis version 0.3.1 Index]