R: Compute error bounds for a fitted regional frequency...

regsimq {lmomRFA}

R Documentation

Compute error bounds for a fitted regional frequency distribution

Description

Computes, using Monte Carlo simulation, relative error bounds for estimated quantiles of a regional frequency distribution fitted by regfit.

Usage

regsimq(qfunc, para, cor = 0, index = NULL, nrec, nrep = 10000,
        fit = "gev", f = c(0.01, 0.1, 0.5, 0.9, 0.99, 0.999),
        boundprob = c(0.05, 0.95), save = TRUE)

Arguments

`qfunc`	List containing the quantile functions for each site. Can also be a single quantile function, which will be used for each site.
`para`	Parameters of the quantile functions at each site. If `qfunc` is a list, `para` must be a list of the same length whose components are numeric vectors, the parameters of the corresponding component of `qfunc`. If `qfunc` is a single quantile function, `para` can be a single vector, containing a single set of parameter values that will be used for each site; a matrix or data frame whose rows each contain the parameter values for one site; or a list of length `length(nrec)` whose components are numeric vectors, each containing the parameter values for one site.
`cor`	Specifies the correlation matrix of the frequency distribution of each site's data. Can be a matrix (which will be rescaled to a correlation matrix if necessary) or a constant (which will be taken as the correlation between each pair of sites).
`index`	Specifies the value of the site-specific scale factor (“index flood”) at each site. Can be: a vector, containing the values at each site; a constant, which will be taken to be the index flood value at each site; or (the default) `NULL`, in which case the index floods at each site will be taken to be the means of the quantile functions implied by `qfunc` and `para`, and will be computed by numerical integration of those quantile functions.
`nrec`	Numeric vector containing the record lengths at each site.
`nrep`	Number of simulated regions.
`fit`	Character string specifying the distribution to be fitted. See “Details” below.
`f`	Vector of probabilities corresponding to the quantiles whose accuracy is to be estimated.
`boundprob`	Vector of probabilities for which error bounds will be computed.
`save`	Logical. Should the simulated values of the ratio of the estimated to the true regional growth curve be saved? These values are needed when `sitequantbounds` is called with its argument `index` present, e.g. to compute error bounds for quantiles at sites other than those whose data were used to fit the regional frequency distribution (e.g., ungauged sites). If this computation is not required, storage can be saved by setting `save` to `FALSE`.

Details

A realization of data from a region is generated as follows. The frequency distributions at sites (specified by arguments qfunc and para) are expressed as Q_i(F)=\mu_i q_i(F) where \mu_i is the site-specific scale factor (“index flood”) and q_i(F) is the at-site growth curve. At each simulation run the at-site growth curves of each site are randomly permuted, and are scaled by the (unpermuted) index flood values for the sites. Data are simulated from these frequency distributions, with inter-site correlation specified by argument cor and record lengths at each site specified by argument nrec. The regional frequency distribution specified by argument fit is then fitted to the simulated data, as in function regfit. The procedure is as described in Hosking and Wallis (1997), Table 6.1, except that the permutation of at-site growth curves is a later modification, intended to give more realistic sets of simulated data. For more details, including exact definitions of quantities computed in the simulation and returned by functions regsimq, regquantbounds, and regsitebounds, see vignette RegSim.

From each realization the sample mean values and estimates of the quantiles of the regional growth curve, for nonexceedance probabilities specified by argument f, are saved.

From the simulated values, for each quantile specified by argument f the relative root mean square error (relative RMSE) is computed as in Hosking and Wallis (1997, eq. (6.15)). Error bounds are also computed, as in Hosking and Wallis (1997, eq. (6.18)) but with bound probabilities specified by argument boundprob rather than the fixed values 0.05 and 0.95 considered by Hosking and Wallis. The error bounds are sample quantiles, across the nrep realizations, of the ratio of the estimated regional growth curve to the true at-site growth curve or of the ratio of the estimated to the true quantiles at individual sites.

For distribution fit there should exist a function to estimate the parameters of the distribution given a set of L-moments. The function should have a name that is the character string "pel" followed by the character string fit. It should accept a single argument, a vector containing L-moments \ell_1, \ell_2, t_3, t_4, etc., and return a vector of distribution parameters.

For distribution fit there should also exist a quantile function, which should have a name that is the character string "qua" followed by the character string fit. It should accept two arguments: a vector of probabilities and a vector containing the parameters of the distribution.

The search path used to find the "pel" and "qua" functions is the same as for arguments supplied to regsimq, i.e. the enclosing frames of the function, followed by the search path specified by search().

The estimation routines and quantile functions in package lmom have the form described here. For example, to use a generalized extreme value distribution set fit to be the string "gev"; then the fitting function pelgev and the quantile function quagev will be used (unless these functions have been masked by another object on the search path).

Value

An object of class "regsimq". This is a list with the following components:

`f`	Vector of probabilities corresponding to the quantiles whose accuracy is to be estimated. A copy of argument `f`.
`boundprob`	Vector of probabilities corresponding to the error bounds. A copy of argument `boundprob`.
`nrep`	Number of simulated regions.
`relbounds.rgc`	Data frame containing the relative RMSE and error bounds for the regional growth curve. It has columns `"f"`, probabilities corresponding to each quantile, `"rel.RMSE"`, relative RMSE of quantiles of regional growth curve, and, for each bound probability in `boundprob`, a column giving the error bound (quantile of the empirical distribution of simulated values of the ratio of the estimated regional growth curve to the true at-site growth curve) for that bound probability.
`relbounds.by.site`	List of `length(nrec)` data frames. Each data frame contains the relative RMSE and error bounds for quantiles at one site, and has the same structure as component `relbounds.rgc` of the return value.
`true.asgc`	If `save` is `TRUE`, a matrix of dimension `length(f)` `\times` `length(nrec)`, containing values of the at-site growth curves (quantile functions at each site, divided by the site-specific scale factors) for quantiles corresponding to probabilities in `f`. If `save` is `FALSE`, list element `true.asgc` is `NULL`.
`sim.rgc`	If `save` is `TRUE`, a matrix of dimension `length(f)` `\times` `nrep`, containing the simulated values of the estimated regional growth curve for quantiles corresponding to probabilities in `f`. If `save` is `FALSE`, list element `sim.rgc` is `NULL`.

Note

Error bounds for the regional growth curve apply to the regional growth curve regarded as an estimator of the growth curve for a randomly chosen site. The growth curve for site i is defined to be q_i(\,.\,)=Q_i(\,.\,)/\mu_i where Q_i(\,.\,) is the site's quantile function and \mu_i is the site-specific scale factor (“index flood”) for site i. For each of the nrep simulated regions, and each probability F in f, the regional growth curve \hat{q}(F) is estimated and the ratios \hat{q}(F)/q_i(F) are calculated. The relative error bounds are empirical quantiles, corresponding to the probabilities in boundprob, of the nrep * length(nrec) values of these ratios obtained from the simulations.

This differs from the approach taken in Hosking and Wallis (1997), Table 6.2 and Fig. 6.2, in which error bounds are computed regarding the estimated regional growth curve as an estimator of the regional average growth curve q^{\rm R}(\,.\,), the harmonic mean of the sites' growth curves (Hosking and Wallis, 1997, p. 102).

When the parent region is homogeneous, with identical frequency distributions at each site (apart from a site-specific scale factor), the two approaches give identical results. For heterogeneous regions the “regard as estimator of randomly chosen site growth curve” approach yields error bounds that are wider, but are more realistic given that the ultimate aim of regional frequency analysis is estimation of quantiles at individual sites.

Author(s)

J. R. M. Hosking jrmhosking@gmail.com

References

Hosking, J. R. M., and Wallis, J. R. (1997). Regional frequency analysis: an approach based on L-moments. Cambridge University Press.

Examples

data(Cascades)              # A regional data set

rmom <- regavlmom(Cascades) # Regional average L-moments

# Fit generalized normal distribution to regional data
rfit <- regfit(Cascades, "gno")

# Set up an artificial region to be simulated:
# -- Same number of sites as Cascades
# -- Same record lengths as Cascades
# -- Same site means as Cascades
# -- L-CV varies linearly across sites, with mean value equal
#    to the regional average L-CV for the Cascades data.
#    'LCVrange' specifies the  range of L-CV across the sites.
# -- L-skewness the same at each site, and equal to the regional
#    average L-skewness for the Cascades data
nsites <- nrow(Cascades)
means <- Cascades$mean
LCVrange <- 0.025
LCVs <- seq(rmom[2]-LCVrange/2, rmom[2]+LCVrange/2, len=nsites)
Lskews<-rep(rmom[3], nsites)

# Each site will have a generalized normal distribution:
# get the parameter values for each site
pp <- t(apply(cbind(means, means*LCVs ,Lskews), 1, pelgno))

# Set correlation between each pair of sites to 0.64, the
# average inter-site correlation for the Cascades data
avcor <- 0.64

# Run the simulation.  To save time, use only 100 replications.
simq <- regsimq(qfunc=quagno, para=pp, cor=avcor, nrec=Cascades$n,
  nrep=100, fit="gno")

# Relative RMSE and error bounds for the regional growth curve
simq$relbounds.rgc

# Relative RMSE and error bounds for quantiles at site 3
simq$relbounds.by.site[[3]]

[Package lmomRFA version 3.6 Index]