calibration_simplex {CalSim} | R Documentation |
Calibration Simplex
Description
Generates an object of class calibration_simplex
which can be used to assess the calibration
of ternary probability forecasts. The Calibration Simplex can be seen as generalization of the reliability diagram
for binary probability forecasts. For details on the interpretation of the calibration simplex, see Wilks (2013). Be
aware that some minor changes have been made compared to the calibration simplex as suggested by Wilks (2013) (see note below).
As a somewhat experimental feature, multinomial p-values can be used for uncertainty quantification, that is, as a tool to judge whether the observed discrepancies may be merely coincidental or whether the predictions may in fact be miscalibrated, see Resin (2020, Section 4.2).
Usage
calibration_simplex(n, p1, p2, p3, obs, test_stat, percentagewise)
## Default S3 method:
calibration_simplex(
n = 10,
p1 = NULL,
p2 = NULL,
p3 = NULL,
obs = NULL,
test_stat = "LLR",
percentagewise = FALSE
)
Arguments
n |
A natural number. |
p1 |
A vector containing the forecasted probabilities for the first (1) category, e.g. below-normal. |
p2 |
A vector containing the forecasted probabilities for the second (2) category, e.g. near-normal. |
p3 |
A vector containing the forecasted probabilities for the third (3) category, e.g. above-normal. |
obs |
A vector containing the observed outcomes (Categories are encoded as 1 (e.g. below-normal), 2 (e.g. near-normal) and 3 (e.g. above-normal)). |
test_stat |
A string indicating which test statistic is to be used for the multinomial test in each bin. Options are "LLR" (log-likelihood ratio; default), "Chisq" (Pearson's chi-square) and "Prob" (probability mass statistic). See details |
percentagewise |
Logical, specifying whether probabilities are percentagewise (summing to 100) or not (summing to 1). |
Details
Only two of the three forecast probability vectors (p1
, p2
and p3
) need to be specified.
The p-values are based on multinomial tests comparing the observed frequencies within a bin
with the average forecast probabilities within the bin as outlined in Resin (2020, Section 4.2).
The p-values are exact and do not rely on asymptotics, however, it is assumed that the true
distribution (under the hypothesis of forecast calibration) within each bin
is approximated well by the multinomial distribution. If n
is small the
approximation may be poor, resulting in unreliable p-values. p-Values less than 0.0001 are not
exact but merely indicate a value less than 0.0001.
Value
A list with class "calibration_simplex" containing
n |
As input by user or default. |
n_bins |
Computed from |
n_obs |
Total number of observations. |
freq |
Vector of length |
cond_rel_freq |
Matrix containing the observed outcome frequencies within each bin. |
cond_ave_prob |
Matrix containing the average forecast probabilities within each bin. |
pvals |
Exact multinomial p-values within each bin. See details. |
Object of class calibration_simplex
.
Note
In contrast to the calibration simplex proposed by Daniel S. Wilks, 2013, the simplex has been
mirrored at the diagonal through the left bottom hexagon. The miscalibration error is by default calculated
precisely (in each bin as the difference of the relative frequencies of each class and the
average forecast probabilities) instead of approximately (using Wilks original formula).
Approximate errors can be used by setting true_error = FALSE
when using plot.calibration_simplex
.
References
Daniel S. Wilks, 2013, The Calibration Simplex: A Generalization of the Reliability Diagram for Three-Category Probability Forecasts, Weather and Forecasting, 28, 1210-1218
Resin, J. (2020), A Simple Algorithm for Exact Multinomial Tests, Preprint https://arxiv.org/abs/2008.12682
See Also
Examples
attach(ternary_forecast_example) #see also documentation of sample data
#?ternary_forecast_example
# Calibrated forecast sample
calsim0 = calibration_simplex(p1 = p1, p3 = p3, obs = obs0)
plot(calsim0,use_pvals = TRUE) # with multinomial p-values
# Overconfident forecast sample
calsim1 = calibration_simplex(p1 = p1, p3 = p3, obs = obs1)
plot(calsim1)
# Underconfident forecast sample
calsim2 = calibration_simplex(p1 = p1, p3 = p3, obs = obs2)
plot(calsim2,use_pvals = TRUE) # with multinomial p-values
# Unconditionally biased forecast sample
calsim3 = calibration_simplex(p1 = p1, p3 = p3, obs = obs3)
plot(calsim3)
# Using a different number of bins
calsim = calibration_simplex(n=4, p1 = p1, p3 = p3, obs = obs3)
plot(calsim)
calsim = calibration_simplex(n=13, p1 = p1, p3 = p3, obs = obs3)
plot(calsim, # using some additional plotting parameters:
error_scale = 0.5, # errors are less pronounced (smaller shifts)
min_bin_freq = 100, # dots are plotted only for bins,
# which contain at least 100 forecast-outcome pairs
category_labels = c("below-normal","near-normal","above-normal"),
main = "Sample calibration simplex")
detach(ternary_forecast_example)