R: Compute the Ranked Probability Skill Score

RPSS {s2dv}

R Documentation

Compute the Ranked Probability Skill Score

Description

The Ranked Probability Skill Score (RPSS; Wilks, 2011) is the skill score based on the Ranked Probability Score (RPS; Wilks, 2011). It can be used to assess whether a forecast presents an improvement or worsening with respect to a reference forecast. The RPSS ranges between minus infinite and 1. If the RPSS is positive, it indicates that the forecast has higher skill than the reference forecast, while a negative value means that it has a lower skill.
Examples of reference forecasts are the climatological forecast (same probabilities for all categories for all time steps), persistence, a previous model version, and another model. It is computed as RPSS = 1 - RPS_exp / RPS_ref. The statistical significance is obtained based on a Random Walk test at the specified confidence level (DelSole and Tippett, 2016).
The function accepts either the ensemble members or the probabilities of each data as inputs. If there is more than one dataset, RPSS will be computed for each pair of exp and obs data. The NA ratio of data will be examined before the calculation. If the ratio is higher than the threshold (assigned by parameter na.rm), NA will be returned directly. NAs are counted by per-pair method, which means that only the time steps that all the datasets have values count as non-NA values.

Usage

RPSS(
  exp,
  obs,
  ref = NULL,
  time_dim = "sdate",
  memb_dim = "member",
  cat_dim = NULL,
  dat_dim = NULL,
  prob_thresholds = c(1/3, 2/3),
  indices_for_clim = NULL,
  Fair = FALSE,
  weights_exp = NULL,
  weights_ref = NULL,
  cross.val = FALSE,
  na.rm = FALSE,
  sig_method.type = "two.sided.approx",
  alpha = 0.05,
  ncores = NULL
)

Arguments

`exp`	A named numerical array of either the forecast with at least time and member dimensions, or the probabilities with at least time and category dimensions. The probabilities can be generated by `s2dv::GetProbs`.
`obs`	A named numerical array of either the observation with at least time dimension, or the probabilities with at least time and category dimensions. The probabilities can be generated by `s2dv::GetProbs`. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'.
`ref`	A named numerical array of either the reference forecast with at least time and member dimensions, or the probabilities with at least time and category dimensions. The probabilities can be generated by `s2dv::GetProbs`. The dimensions must be the same as 'exp' except 'memb_dim' and 'dat_dim'. If there is only one reference dataset, it should not have dataset dimension. If there is corresponding reference for each experiment, the dataset dimension must have the same length as in 'exp'. If 'ref' is NULL, the climatological forecast is used as reference forecast. The default value is NULL.
`time_dim`	A character string indicating the name of the time dimension. The default value is 'sdate'.
`memb_dim`	A character string indicating the name of the member dimension to compute the probabilities of the forecast and the reference forecast. The default value is 'member'. If the data are probabilities, set memb_dim as NULL.
`cat_dim`	A character string indicating the name of the category dimension that is needed when exp, obs, and ref are probabilities. The default value is NULL, which means that the data are not probabilities.
`dat_dim`	A character string indicating the name of dataset dimension. The length of this dimension can be different between 'exp' and 'obs'. The default value is NULL.
`prob_thresholds`	A numeric vector of the relative thresholds (from 0 to 1) between the categories. The default value is c(1/3, 2/3), which corresponds to tercile equiprobable categories.
`indices_for_clim`	A vector of the indices to be taken along 'time_dim' for computing the thresholds between the probabilistic categories. If NULL, the whole period is used. The default value is NULL.
`Fair`	A logical indicating whether to compute the FairRPSS (the potential RPSS that the forecast would have with an infinite ensemble size). The default value is FALSE.
`weights_exp`	A named numerical array of the forecast ensemble weights for probability calculation. The dimension should include 'memb_dim', 'time_dim' and 'dat_dim' if there are multiple datasets. All dimension lengths must be equal to 'exp' dimension lengths. The default value is NULL, which means no weighting is applied. The ensemble should have at least 70 members or span at least 10 time steps and have more than 45 members if consistency between the weighted and unweighted methodologies is desired.
`weights_ref`	Same as 'weights_exp' but for the reference forecast.
`cross.val`	A logical indicating whether to compute the thresholds between probabilistics categories in cross-validation. The default value is FALSE.
`na.rm`	A logical or numeric value between 0 and 1. If it is numeric, it means the lower limit for the fraction of the non-NA values. 1 is equal to FALSE (no NA is acceptable), 0 is equal to TRUE (all NAs are acceptable). than na.rm. Otherwise, RPS will be calculated. The default value is FALSE.
`sig_method.type`	A character string indicating the test type of the significance method. Check `RandomWalkTest()` parameter `test.type` for details. The default is 'two.sided.approx', which is the default of `RandomWalkTest()`.
`alpha`	A numeric of the significance level to be used in the statistical significance test. The default value is 0.05.
`ncores`	An integer indicating the number of cores to use for parallel computation. The default value is NULL.

Value

`$rpss`	A numerical array of RPSS with dimensions c(nexp, nobs, the rest dimensions of 'exp' except 'time_dim' and 'memb_dim' dimensions). nexp is the number of experiment (i.e., dat_dim in exp), and nobs is the number of observation i.e., dat_dim in obs). If dat_dim is NULL, nexp and nobs are omitted.
`$sign`	A logical array of the statistical significance of the RPSS with the same dimensions as $rpss.

References

Wilks, 2011; https://doi.org/10.1016/B978-0-12-385022-5.00008-7 DelSole and Tippett, 2016; https://doi.org/10.1175/MWR-D-15-0218.1

Examples

set.seed(1)
exp <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50))
set.seed(2)
obs <- array(rnorm(300), dim = c(lat = 3, lon = 2, sdate = 50))
set.seed(3)
ref <- array(rnorm(3000), dim = c(lat = 3, lon = 2, member = 10, sdate = 50))
weights <- sapply(1:dim(exp)['sdate'], function(i) {
            n <- abs(rnorm(10))
            n/sum(n)
          })
dim(weights) <- c(member = 10, sdate = 50)
# Use data as input
res <- RPSS(exp = exp, obs = obs) ## climatology as reference forecast
res <- RPSS(exp = exp, obs = obs, ref = ref) ## ref as reference forecast
res <- RPSS(exp = exp, obs = obs, ref = ref, weights_exp = weights, weights_ref = weights)
res <- RPSS(exp = exp, obs = obs, alpha = 0.01, sig_method.type = 'two.sided')

# Use probs as input
exp_probs <- GetProbs(exp, memb_dim = 'member')
obs_probs <- GetProbs(obs, memb_dim = NULL)
ref_probs <- GetProbs(ref, memb_dim = 'member')
res <- RPSS(exp = exp_probs, obs = obs_probs, ref = ref_probs, memb_dim = NULL, 
           cat_dim = 'bin')

[Package s2dv version 2.0.0 Index]