RPS {s2dv}R Documentation

Compute the Ranked Probability Score

Description

The Ranked Probability Score (RPS; Wilks, 2011) is defined as the sum of the squared differences between the cumulative forecast probabilities (computed from the ensemble members) and the observations (defined as 0 did not happen and 100 of multi-categorical probabilistic forecasts. The RPS ranges between 0 (perfect forecast) and n-1 (worst possible forecast), where n is the number of categories. In the case of a forecast divided into two categories (the lowest number of categories that a probabilistic forecast can have), the RPS corresponds to the Brier Score (BS; Wilks, 2011), therefore ranging between 0 and 1.
The function first calculates the probabilities for forecasts and observations, then use them to calculate RPS. Or, the probabilities of exp and obs can be provided directly to compute the score. If there is more than one dataset, RPS will be computed for each pair of exp and obs data. The fraction of acceptable NAs can be adjusted.

Usage

RPS(
  exp,
  obs,
  time_dim = "sdate",
  memb_dim = "member",
  cat_dim = NULL,
  dat_dim = NULL,
  prob_thresholds = c(1/3, 2/3),
  indices_for_clim = NULL,
  Fair = FALSE,
  weights = NULL,
  cross.val = FALSE,
  na.rm = FALSE,
  ncores = NULL
)

Arguments

exp

A named numerical array of either the forecasts with at least time and member dimensions, or the probabilities with at least time and category dimensions. The probabilities can be generated by s2dv::GetProbs.

obs

A named numerical array of either the observation with at least time dimension, or the probabilities with at least time and category dimensions. The probabilities can be generated by s2dv::GetProbs. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'.

time_dim

A character string indicating the name of the time dimension. The default value is 'sdate'.

memb_dim

A character string indicating the name of the member dimension to compute the probabilities of the forecast. The default value is 'member'. If the data are probabilities, set memb_dim as NULL.

cat_dim

A character string indicating the name of the category dimension that is needed when the exp and obs are probabilities. The default value is NULL, which means that the data are not probabilities.

dat_dim

A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL.

prob_thresholds

A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories.

indices_for_clim

A vector of the indices to be taken along 'time_dim' for computing the thresholds between the probabilistic categories. If NULL, the whole period is used. The default value is NULL.

Fair

A logical indicating whether to compute the FairRPS (the potential RPS that the forecast would have with an infinite ensemble size). The default value is FALSE.

weights

A named numerical array of the weights for 'exp' probability calculation. If 'dat_dim' is NULL, the dimensions should include 'memb_dim' and 'time_dim'. Else, the dimension should also include 'dat_dim'. The default value is NULL. The ensemble should have at least 70 members or span at least 10 time steps and have more than 45 members if consistency between the weighted and unweighted methodologies is desired.

cross.val

A logical indicating whether to compute the thresholds between probabilistic categories in cross-validation. The default value is FALSE.

na.rm

A logical or numeric value between 0 and 1. If it is numeric, it means the lower limit for the fraction of the non-NA values. 1 is equal to FALSE (no NA is acceptable), 0 is equal to TRUE (all NAs are acceptable). than na.rm. Otherwise, RPS will be calculated. The default value is FALSE.

ncores

An integer indicating the number of cores to use for parallel computation. The default value is NULL.

Value

A numerical array of RPS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation (i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted.

References

Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7

Examples

# Use synthetic data
exp <- array(rnorm(1000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50))
obs <- array(rnorm(1000), dim = c(lat = 3, lon = 2, sdate = 50))
res <- RPS(exp = exp, obs = obs)
# Use probabilities as inputs
exp_probs <- GetProbs(exp, time_dim = 'sdate', memb_dim = 'member')
obs_probs <- GetProbs(obs, time_dim = 'sdate', memb_dim = NULL)
res2 <- RPS(exp = exp_probs, obs = obs_probs, memb_dim = NULL, cat_dim = 'bin')



[Package s2dv version 2.0.0 Index]