R: Apply Verification Metrics to Large Datasets

veriApply {easyVerification}

R Documentation

Apply Verification Metrics to Large Datasets

Description

This wrapper applies verification metrics to arrays of forecast ensembles and verifying observations. Various array-based data formats are supported. Additionally, continuous forecasts (and observations) are transformed to category forecasts using user-defined absolute thresholds or percentiles of the long-term climatology (see details).

Usage

veriApply(
  verifun,
  fcst,
  obs,
  fcst.ref = NULL,
  tdim = length(dim(fcst)) - 1,
  ensdim = length(dim(fcst)),
  prob = NULL,
  threshold = NULL,
  strategy = "none",
  na.rm = FALSE,
  fracmin = 0.8,
  nmin = NULL,
  parallel = FALSE,
  maxncpus = 16,
  ncpus = NULL,
  ...
)

Arguments

`verifun`	Name of function to compute verification metric (score, skill score)
`fcst`	array of forecast values (at least 2-dimensional)
`obs`	array or vector of verifying observations
`fcst.ref`	array of forecast values for the reference forecast (skill scores only)
`tdim`	index of dimension with the different forecasts
`ensdim`	index of dimension with the different ensemble members
`prob`	probability threshold for category forecasts (see below)
`threshold`	absolute threshold for category forecasts (see below)
`strategy`	type of out-of-sample reference forecasts or namelist with arguments as in `indRef` or list of indices for each forecast instance
`na.rm`	logical, should incomplete forecasts be used?
`fracmin`	fraction of forecasts that are not-missing for forecast to be evaluated. Used to determine `nmin` when `is.null(nmin)`
`nmin`	number of forecasts that are not-missing for forecast to be evaluated. If both `nmin` an d `fracmin` are set, `nmin` takes precedence
`parallel`	logical, should parallel execution of verification be used (see below)?
`maxncpus`	upper bound for self-selected number of CPUs
`ncpus`	number of CPUs used in parallel computation, self-selected number of CPUs is used when `is.null(ncpus)` (the default).
`...`	additional arguments passed to `verifun`

List of functions to be called

The selection of verification functions supplied with this package and as part of SpecsVerification can be enquired using ls(pos='package:easyVerification') and ls(pos='package:SpecsVerification') respectively. Please note, however, that only some of the functions provided as part of SpecsVerification can be used with veriApply. Functions that can be used include for example the (fair) ranked probability score EnsRps, FairRps, and its skill score EnsRpss, FairRpss, or the continuous ranked probability score EnsCrps, etc.

Conversion to category forecasts

To automatically convert continuous forecasts into category forecasts, absolute (threshold) or relative thresholds (prob) have to be supplied. For some scores and skill scores (e.g. the ROC area and skill score), a list of categories will be supplied with categories ordered. That is, if prob = 1:2/3 for tercile forecasts, cat1 corresponds to the lower tercile, cat2 to the middle, and cat3 to the upper tercile.

Absolute and relative thresholds can be supplied in various formats. If a vector of thresholds is supplied with the threshold argument, the same threshold is applied to all forecasts (e.g. lead times, spatial locations). If a vector of relative thresholds is supplied using prob, the category boundaries to be applied are computed separately for each space-time location. Relative boundaries specified using prob are computed separately for the observations and forecasts, but jointly for all available ensemble members.

Location specific thresholds can also be supplied. If the thresholds are supplied as a matrix, the number of rows has to correspond to the number of forecast space-time locations (i.e. same length as length(fcst)/prod(dim(fcst)[c(tdim, ensdim)])). Alternatively, but equivalently, the thresholds can also be supplied with the dimensionality corresponding to the obs array with the difference that the forecast dimension in obs contains the category boundaries (absolute or relative) and thus may differ in length.

Out-of-sample reference forecasts

strategy specifies the set-up of the climatological reference forecast for skill scores if no explicit reference forecast is provided. The default is strategy = "none", that is all available observations are used as equiprobable members of a reference forecast. Alternatively, strategy = "crossval" can be used for leave-one-out crossvalidated reference forecasts, or strategy = "forward" for a forward protocol (see indRef).

Alternatively, a list with named parameters corresponding to the input arguments of indRef can be supplied for more fine-grained control over standard cases. Finally, also a list with observation indices to be used for each forecast can be supplied (see generateRef).

Parallel processing

Parallel processing is enabled using the parallel package. Parallel verification is using ncpus FORK clusters or, if ncpus are not specified, one less than the autod-etected number of cores. The maximum number of cores used for parallel processing with auto-detection of the number of available cores can be set with the maxncpus argument.

Progress bars are available for non-parallel computation of the verification metrics. Please note, however, that the progress bar only indicates the time of computation needed for the actual verification metrics, input and output re-arrangement is not included in the progress bar.

Note

If the forecasts and observations are only available as category probabilities (or ensemble counts as used in SpecsVerification) as opposed to as continuous numeric variables, veriApply cannot be used but the atomic verification functions for category forecasts have to be applied directly.

Out-of-sample reference forecasts are not fully supported for categorical forecasts defined on the distribution of forecast values (e.g. using the argument prob). Whereas only the years specified in strategy are used for the reference forecasts, the probability thresholds for the reference forecasts are defined on the collection of years specified in strategy.

Examples

tm <- toyarray()
f.me <- veriApply("EnsMe", tm$fcst, tm$obs)

## find more examples and instructions in the vignette
## Not run: 
devtools::install_github("MeteoSwiss/easyVerification", build_vignettes = TRUE)
library("easyVerification")
vignette("easyVerification")

## End(Not run)

[Package easyVerification version 0.4.5 Index]