calibration_simplex {CalSim}R Documentation

Calibration Simplex


Generates an object of class calibration_simplex which can be used to assess the calibration of ternary probability forecasts. The Calibration Simplex can be seen as generalization of the reliability diagram for binary probability forecasts. For details on the interpretation of the calibration simplex, see Wilks (2013). Be aware that some minor changes have been made compared to the calibration simplex as suggested by Wilks (2013) (see note below).

As a somewhat experimental feature, multinomial p-values can be used for uncertainty quantification, that is, as a tool to judge whether the observed discrepancies may be merely coincidental or whether the predictions may in fact be miscalibrated, see Resin (2020, Section 4.2).


calibration_simplex(n, p1, p2, p3, obs, test_stat, percentagewise)

## Default S3 method:
  n = 10,
  p1 = NULL,
  p2 = NULL,
  p3 = NULL,
  obs = NULL,
  test_stat = "LLR",
  percentagewise = FALSE



A natural number.


A vector containing the forecasted probabilities for the first (1) category, e.g. below-normal.


A vector containing the forecasted probabilities for the second (2) category, e.g. near-normal.


A vector containing the forecasted probabilities for the third (3) category, e.g. above-normal.


A vector containing the observed outcomes (Categories are encoded as 1 (e.g. below-normal), 2 (e.g. near-normal) and 3 (e.g. above-normal)).


A string indicating which test statistic is to be used for the multinomial test in each bin. Options are "LLR" (log-likelihood ratio; default), "Chisq" (Pearson's chi-square) and "Prob" (probability mass statistic). See details


Logical, specifying whether probabilities are percentagewise (summing to 100) or not (summing to 1).


Only two of the three forecast probability vectors (p1, p2 and p3) need to be specified.

The p-values are based on multinomial tests comparing the observed frequencies within a bin with the average forecast probabilities within the bin as outlined in Resin (2020, Section 4.2). The p-values are exact and do not rely on asymptotics, however, it is assumed that the true distribution (under the hypothesis of forecast calibration) within each bin is approximated well by the multinomial distribution. If n is small the approximation may be poor, resulting in unreliable p-values. p-Values less than 0.0001 are not exact but merely indicate a value less than 0.0001.


A list with class "calibration_simplex" containing


As input by user or default.


Computed from n. Number of hexagons.


Total number of observations.


Vector of length n_bins containing the number of observations within each bin.


Matrix containing the observed outcome frequencies within each bin.


Matrix containing the average forecast probabilities within each bin.


Exact multinomial p-values within each bin. See details.

Object of class calibration_simplex.


In contrast to the calibration simplex proposed by Daniel S. Wilks, 2013, the simplex has been mirrored at the diagonal through the left bottom hexagon. The miscalibration error is by default calculated precisely (in each bin as the difference of the relative frequencies of each class and the average forecast probabilities) instead of approximately (using Wilks original formula). Approximate errors can be used by setting true_error = FALSE when using plot.calibration_simplex.


Daniel S. Wilks, 2013, The Calibration Simplex: A Generalization of the Reliability Diagram for Three-Category Probability Forecasts, Weather and Forecasting, 28, 1210-1218

Resin, J. (2020), A Simple Algorithm for Exact Multinomial Tests, Preprint

See Also




attach(ternary_forecast_example)   #see also documentation of sample data

# Calibrated forecast sample
calsim0 = calibration_simplex(p1 = p1, p3 = p3, obs = obs0)
plot(calsim0,use_pvals = TRUE) # with multinomial p-values

# Overconfident forecast sample
calsim1 = calibration_simplex(p1 = p1, p3 = p3, obs = obs1)

# Underconfident forecast sample
calsim2 = calibration_simplex(p1 = p1, p3 = p3, obs = obs2)
plot(calsim2,use_pvals = TRUE) # with multinomial p-values

# Unconditionally biased forecast sample
calsim3 = calibration_simplex(p1 = p1, p3 = p3, obs = obs3)

# Using a different number of bins
calsim = calibration_simplex(n=4, p1 = p1, p3 = p3, obs = obs3)

calsim = calibration_simplex(n=13, p1 = p1, p3 = p3, obs = obs3)
plot(calsim,               # using some additional plotting parameters:
     error_scale = 0.5,    # errors are less pronounced (smaller shifts)
     min_bin_freq = 100,   # dots are plotted only for bins,
                           # which contain at least 100 forecast-outcome pairs
     category_labels = c("below-normal","near-normal","above-normal"),
     main = "Sample calibration simplex")


[Package CalSim version 0.5.2 Index]