lineplot {BRcal}R Documentation

Lineplot for LLO-adjusted Probability Predictions

Description

Function to visualize how predicted probabilities change under MLE-recalibration and boldness-recalibration.

Usage

lineplot(
  x = NULL,
  y = NULL,
  t_levels = NULL,
  df = NULL,
  Pmc = 0.5,
  event = 1,
  return_df = FALSE,
  epsilon = .Machine$double.eps,
  title = "Line Plot",
  ylab = "Probability",
  xlab = "Posterior Model Probability",
  ylim = c(0, 1),
  breaks = seq(0, 1, by = 0.2),
  thin_to = NULL,
  thin_percent = NULL,
  thin_by = NULL,
  seed = 0,
  optim_options = NULL,
  nloptr_options = NULL,
  ggpoint_options = list(alpha = 0.35, size = 1.5, show.legend = FALSE),
  ggline_options = list(alpha = 0.25, linewidth = 0.5, show.legend = FALSE)
)

Arguments

x

a numeric vector of predicted probabilities of an event. Must only contain values in [0,1].

y

a vector of outcomes corresponding to probabilities in x. Must only contain two unique values (one for "events" and one for "non-events"). By default, this function expects a vector of 0s (non-events) and 1s (events).

t_levels

Vector of desired level(s) of calibration at which to plot contours.

df

Dataframe returned by previous call to lineplot() specially formatted for use in this function. Only used for faster plotting when making minor cosmetic changes to a previous call.

Pmc

The prior model probability for the calibrated model M_c.

event

Value in y that represents an "event". Default value is 1.

return_df

Logical. If TRUE, the dataframe used to build this plot will be returned.

epsilon

Amount by which probabilities are pushed away from 0 or 1 boundary for numerical stability. If a value in x < epsilon, it will be replaced with epsilon. If a value in x > 1-epsilon, that value will be replaced with 1-epsilon.

title

Plot title.

ylab

Label for x-axis.

xlab

Label for x-axis.

ylim

Vector with bounds for y-axis, must be in [0,1].

breaks

Locations along y-axis at which to draw horizontal guidelines, passed to scale_y_continous().

thin_to

When non-null, the observations in (x,y) are randomly sampled without replacement to form a set of size thin_to.

thin_percent

When non-null, the observations in (x,y) are randomly sampled without replacement to form a set that is thin_percent * 100% of the original size of (x,y).

thin_by

When non-null, the observations in (x,y) are thinned by selecting every thin_by observation.

seed

Seed for random thinning. Set to NULL for no seed.

optim_options

List of additional arguments to be passed to optim().

nloptr_options

List with options to be passed to nloptr().

ggpoint_options

List with options to be passed to geom_point().

ggline_options

List with options to be passed to geom_line().

Details

This function leverages ggplot() and related functions from the ggplot2 package (REF).

The goal of this function is to visualize how predicted probabilities change under different recalibration parameters. By default this function only shows how the original probabilities change after MLE recalibration. Argument t_levels can be used to specify a vector of levels of boldness-recalibration to visualize in addition to MLE recalibration.

While the x-axis shows the posterior model probabilities of each set of probabilities, note the posterior model probabilities are not in ascending or descending order. Instead, they simply follow the ordering of how one might use the BRcal package: first looking at the original predictions, then maximizing calibration, then examining how far they can spread out predictions while maintaining calibration with boldness-recalibration.

Value

If return_df = TRUE, a list with the following attributes is returned:

plot

A ggplot object showing how the predicted probabilities under MLE recalibration and specified levels of boldness-recalibration.

df

Dataframe used to create plot, specially formatted for use in lineplot().

Otherwise just the ggplot object of the plot is returned.

Reusing underlying dataframe via return_df

While this function does not typically come with a large burden on time under moderate sample sizes, there is still a call to optim() under the hood for MLE recalibration and a call to nloptr() for each level of boldness-recalibration that could cause a bottleneck on time. With this in mind, users can specify return_df=TRUE to return the underlying dataframe used to build the resulting lineplot. Then, users can pass this dataframe to df in subsequent calls of lineplot to circumvent these calls to optim and nloptr and make cosmetic changes to the plot.

When return_df=TRUE, both the plot and the dataframe are returned in a list. The dataframe contains 6 columns:

Essentially, each set of probabilities (original, MLE-, and each level of boldness-recalibration) and outcomes are "stacked" on top of each other. The id tells the plotting function how to connect (with line) the same observation as is changes from the original set to MLE- or boldness-recalibration.

Thinning

Another strategy to save time when plotting is to thin the amount of data plotted. When sample sizes are large, the plot can become overcrowded and slow to plot. We provide three options for thinning: thin_to, thin_percent, and thin_by. By default, all three of these settings are set to NULL, meaning no thinning is performed. Users can only specify one thinning strategy at a time. Care should be taken in selecting a thinning approach based on the nature of your data and problem. Note that MLE recalibration and boldness-recalibration will be done using the full set.

Passing additional arguments to geom_point() and geom_line()

To make cosmetic changes to the points and lines plotted, users can pass a list of any desired arguments of geom_point() and geom_line() to ggpoint_options and ggline_options, respectively. These will overwrite everything passed to geom_point() or geom_line() except any aesthetic arguments in aes().

References

Guthrie, A. P., and Franck, C. T. (2024) Boldness-Recalibration for Binary Event Predictions, The American Statistician 1-17.

Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Examples


set.seed(28)
# Simulate 100 predicted probabilities
x <- runif(100)
# Simulated 100 binary event outcomes using x
y <- rbinom(100, 1, x)  # By construction, x is well calibrated.

# Lineplot show change in probabilities from original to MLE-recalibration to 
# specified Levels of Boldness-Recalibration via t_levels
# Return a list with dataframe used to construct plot with return_df=TRUE
lp1 <- lineplot(x, y, t_levels=c(0.98, 0.95), return_df=TRUE)
lp1$plot

# Reusing the previous dataframe to save calculation time
lineplot(df=lp1$df)

# Adjust geom_point cosmetics via ggpoint
# Increase point size and change to open circles
lineplot(df=lp1$df, ggpoint_options=list(size=3, shape=4))

# Adjust geom_line cosmetics via ggline
# Increase line size and change transparencys
lineplot(df=lp1$df, ggline_options=list(linewidth=2, alpha=0.1))

# Thinning down to 75 randomly selected observation
lineplot(df=lp1$df, thin_to=75)

# Thinning down to 53% of the data
lineplot(df=lp1$df, thin_percent=0.53)

# Thinning down to every 3rd observation
lineplot(df=lp1$df, thin_by=3)

# Setting a different seed for thinning
lineplot(df=lp1$df, thin_percent=0.53, seed=47)

# Setting NO seed for thinning (plot will be different every time)
lineplot(df=lp1$df, thin_to=75, seed=NULL)


[Package BRcal version 0.0.4 Index]