lineplot {BRcal} | R Documentation |
Lineplot for LLO-adjusted Probability Predictions
Description
Function to visualize how predicted probabilities change under MLE-recalibration and boldness-recalibration.
Usage
lineplot(
x = NULL,
y = NULL,
t_levels = NULL,
df = NULL,
Pmc = 0.5,
event = 1,
return_df = FALSE,
epsilon = .Machine$double.eps,
title = "Line Plot",
ylab = "Probability",
xlab = "Posterior Model Probability",
ylim = c(0, 1),
breaks = seq(0, 1, by = 0.2),
thin_to = NULL,
thin_percent = NULL,
thin_by = NULL,
seed = 0,
optim_options = NULL,
nloptr_options = NULL,
ggpoint_options = list(alpha = 0.35, size = 1.5, show.legend = FALSE),
ggline_options = list(alpha = 0.25, linewidth = 0.5, show.legend = FALSE)
)
Arguments
x |
a numeric vector of predicted probabilities of an event. Must only contain values in [0,1]. |
y |
a vector of outcomes corresponding to probabilities in |
t_levels |
Vector of desired level(s) of calibration at which to plot contours. |
df |
Dataframe returned by previous call to lineplot() specially formatted for use in this function. Only used for faster plotting when making minor cosmetic changes to a previous call. |
Pmc |
The prior model probability for the calibrated model |
event |
Value in |
return_df |
Logical. If |
epsilon |
Amount by which probabilities are pushed away from 0 or 1
boundary for numerical stability. If a value in |
title |
Plot title. |
ylab |
Label for x-axis. |
xlab |
Label for x-axis. |
ylim |
Vector with bounds for y-axis, must be in [0,1]. |
breaks |
Locations along y-axis at which to draw horizontal guidelines,
passed to |
thin_to |
When non-null, the observations in (x,y) are randomly sampled
without replacement to form a set of size |
thin_percent |
When non-null, the observations in (x,y) are randomly
sampled without replacement to form a set that is |
thin_by |
When non-null, the observations in (x,y) are thinned by
selecting every |
seed |
Seed for random thinning. Set to NULL for no seed. |
optim_options |
List of additional arguments to be passed to optim(). |
nloptr_options |
List with options to be passed to |
ggpoint_options |
List with options to be passed to |
ggline_options |
List with options to be passed to |
Details
This function leverages ggplot()
and related functions from the ggplot2
package (REF).
The goal of this function is to visualize how predicted probabilities change
under different recalibration parameters. By default this function only shows
how the original probabilities change after MLE recalibration. Argument
t_levels
can be used to specify a vector of levels of
boldness-recalibration to visualize in addition to MLE recalibration.
While the x-axis shows the posterior model probabilities of each set of
probabilities, note the posterior model probabilities are not in ascending or
descending order. Instead, they simply follow the ordering of how one might
use the BRcal
package: first looking at the original predictions, then
maximizing calibration, then examining how far they can spread out
predictions while maintaining calibration with boldness-recalibration.
Value
If return_df = TRUE
, a list with the following attributes is
returned:
plot |
A |
df |
Dataframe used to create |
Otherwise just the ggplot
object of the plot is returned.
Reusing underlying dataframe via return_df
While this function does not typically come with a large burden on time
under moderate sample sizes, there is still a call to optim()
under the
hood for MLE recalibration and a call to nloptr()
for each level of
boldness-recalibration that could cause a bottleneck on time. With this in
mind, users can specify return_df=TRUE
to return the underlying dataframe
used to build the resulting lineplot. Then, users can pass this dataframe
to df
in subsequent calls of lineplot
to circumvent these calls to
optim
and nloptr
and make cosmetic changes to the plot.
When return_df=TRUE
, both the plot and the dataframe are returned in a
list. The dataframe contains 6 columns:
-
probs
: the values of each predicted probability under each set -
outcome
: the corresponding outcome for each predicted probability -
post
: the posterior model probability of the set as a whole -
id
: the id of each individual probability used for mapping observations between sets -
set
: the set with which the probability belongs to -
label
: the label used for the x-axis in the lineplot
Essentially, each set of probabilities (original, MLE-, and each level of
boldness-recalibration) and outcomes are "stacked" on top of each other.
The id
tells the plotting function how to connect (with line) the same
observation as is changes from the original set to MLE- or
boldness-recalibration.
Thinning
Another strategy to save time when plotting is to thin the amount of data
plotted. When sample sizes are large, the plot can become overcrowded and
slow to plot. We provide three options for thinning: thin_to
,
thin_percent
, and thin_by
. By default, all three of these settings are
set to NULL
, meaning no thinning is performed. Users can only specify
one thinning strategy at a time. Care should be taken in selecting a
thinning approach based on the nature of your data and problem. Note that
MLE recalibration and boldness-recalibration will be done using the full
set.
Passing additional arguments to geom_point()
and geom_line()
To make cosmetic changes to the points and lines plotted, users can pass a
list of any desired arguments of geom_point()
and geom_line()
to
ggpoint_options
and ggline_options
, respectively. These will overwrite
everything passed to geom_point()
or geom_line()
except any aesthetic
arguments in aes()
.
References
Guthrie, A. P., and Franck, C. T. (2024) Boldness-Recalibration for Binary Event Predictions, The American Statistician 1-17.
Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Examples
set.seed(28)
# Simulate 100 predicted probabilities
x <- runif(100)
# Simulated 100 binary event outcomes using x
y <- rbinom(100, 1, x) # By construction, x is well calibrated.
# Lineplot show change in probabilities from original to MLE-recalibration to
# specified Levels of Boldness-Recalibration via t_levels
# Return a list with dataframe used to construct plot with return_df=TRUE
lp1 <- lineplot(x, y, t_levels=c(0.98, 0.95), return_df=TRUE)
lp1$plot
# Reusing the previous dataframe to save calculation time
lineplot(df=lp1$df)
# Adjust geom_point cosmetics via ggpoint
# Increase point size and change to open circles
lineplot(df=lp1$df, ggpoint_options=list(size=3, shape=4))
# Adjust geom_line cosmetics via ggline
# Increase line size and change transparencys
lineplot(df=lp1$df, ggline_options=list(linewidth=2, alpha=0.1))
# Thinning down to 75 randomly selected observation
lineplot(df=lp1$df, thin_to=75)
# Thinning down to 53% of the data
lineplot(df=lp1$df, thin_percent=0.53)
# Thinning down to every 3rd observation
lineplot(df=lp1$df, thin_by=3)
# Setting a different seed for thinning
lineplot(df=lp1$df, thin_percent=0.53, seed=47)
# Setting NO seed for thinning (plot will be different every time)
lineplot(df=lp1$df, thin_to=75, seed=NULL)