pairwise_comparison {scoringutils} | R Documentation |
Do Pairwise Comparisons of Scores
Description
Compute relative scores between different models making pairwise
comparisons. Pairwise comparisons are a sort of pairwise tournament where all
combinations of two models are compared against each other based on the
overlapping set of available forecasts common to both models.
Internally, a ratio of the mean scores of both models is computed.
The relative score of a model is then the geometric mean of all mean score
ratios which involve that model. When a baseline is provided, then that
baseline is excluded from the relative scores for individual models
(which therefore differ slightly from relative scores without a baseline)
and all relative scores are scaled by (i.e. divided by) the relative score of
the baseline model.
Usually, the function input should be unsummarised scores as
produced by score()
.
Note that the function internally infers the unit of a single forecast by
determining all columns in the input that do not correspond to metrics
computed by score()
. Adding unrelated columns will change results in an
unpredictable way.
The code for the pairwise comparisons is inspired by an implementation by
Johannes Bracher.
The implementation of the permutation test follows the function
permutationTest
from the surveillance
package by Michael Höhle,
Andrea Riebler and Michaela Paul.
Usage
pairwise_comparison(
scores,
by = "model",
metric = "auto",
baseline = NULL,
...
)
Arguments
scores |
A data.table of scores as produced by |
by |
character vector with names of columns present in the input
data.frame. |
metric |
A character vector of length one with the metric to do the
comparison on. The default is "auto", meaning that either "interval_score",
"crps", or "brier_score" will be selected where available.
See |
baseline |
character vector of length one that denotes the baseline model against which to compare other models. |
... |
additional arguments for the comparison between two models. See
|
Value
A ggplot2 object with a coloured table of summarised scores
Author(s)
Nikos Bosse nikosbosse@gmail.com
Johannes Bracher, johannes.bracher@kit.edu
Examples
scores <- score(example_quantile)
pairwise <- pairwise_comparison(scores, by = "target_type")
library(ggplot2)
plot_pairwise_comparison(pairwise, type = "mean_scores_ratio") +
facet_wrap(~target_type)