fit_gev {climextRemes}R Documentation

Fit a generalized extreme value model to block maxima or minima

Description

Fit a generalized extreme value model, designed specifically for climate data. It includes options for variable weights (useful for local likelihood), as well as for bootstrapping to estimate uncertainties. Results can be returned in terms of parameter values, return values, return periods, return probabilities, and differences in either return values or log return probabilities (i.e., log risk ratios).

Usage

fit_gev(
  y,
  x = NULL,
  locationFun = NULL,
  scaleFun = NULL,
  shapeFun = NULL,
  nReplicates = 1,
  replicateIndex = NULL,
  weights = NULL,
  returnPeriod = NULL,
  returnValue = NULL,
  getParams = FALSE,
  getFit = FALSE,
  xNew = NULL,
  xContrast = NULL,
  maxes = TRUE,
  scaling = 1,
  bootSE = FALSE,
  bootControl = NULL,
  optimArgs = NULL,
  optimControl = NULL,
  missingFlag = NULL,
  initial = NULL,
  logScale = NULL,
  .normalizeX = TRUE,
  .getInputs = FALSE,
  .allowNoInt = TRUE
)

Arguments

y

a numeric vector of observed maxima or minima values. See Details for how the values of y should be ordered if there are multiple replicates and the values of x are identical for all replicates. For better optimization performance, it is recommended that the y have magnitude around one (see Details), for which one can use scaling.

x

a data frame, or object that can be converted to a data frame with columns corresponding to covariate/predictor/feature variables and each row containing the values of the variable for the corresponding observed maximum/minimum. The number of rows should either equal the length of y or (if there is more than one replicate) it can optionally equal the number of observations in a single replicate, in which case the values will be assumed to be the same for all replicates.

locationFun

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the location parameter using columns from x. x must be supplied if this is anything other than NULL or ~1.

scaleFun

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the (potentially transformed) scale parameter using columns from x. x must be supplied if this is anything other than NULL or ~1. logScale controls whether this determines the log of the scale or the scale directly.

shapeFun

formula, vector of character strings, or indices describing a linear model (i.e., regression function) for the shape parameter using columns from x. x must be supplied if this is anything other than NULL or ~1.

nReplicates

numeric value indicating the number of replicates.

replicateIndex

numeric vector providing the index of the replicate corresponding to each element of y. Used (and therefore required) only when using bootstrapping with the resampling by replicates based on the by element of bootControl.

weights

a vector providing the weights for each observation. When there is only one replicate or the weights do not vary by replicate, a vector of length equal to the number of observations. When weights vary by replicate, this should be of equal length to y. Likelihood contribution of each observation is multiplied by the corresponding weight.

returnPeriod

numeric value giving the number of blocks for which return values should be calculated. For example a returnPeriod of 20 corresponds to the value of an event that occurs with probability 1/20 in any block and therefore occurs on average every 20 blocks. Often blocks will correspond to years.

returnValue

numeric value giving the value for which return probabilities/periods should be calculated, where the period would be the average number of blocks until the value is exceeded and the probability the probability of exceeding the value in any single block.

getParams

logical indicating whether to return the fitted parameter values and their standard errors; WARNING: parameter values for models with covariates for the scale parameter must interpreted based on the value of logScale.

getFit

logical indicating whether to return the full fitted model (potentially useful for model evaluation and for understanding optimization problems); note that estimated parameters in the fit object for nonstationary models will not generally match the MLE provided when getParams is TRUE because covariates are normalized before fitting and the fit object is based on the normalized covariates. Similarly, parameters will not match if scaling is not 1.

xNew

object of the same form as x, providing covariate/predictor/feature values for which return values/periods/probabilities are desired.

xContrast

object of the same form and dimensions as xNew, providing covariate/predictor/feature values for which to calculate the differences of the return values and/or log return probabilities relative to the values in xNew. This provides a way to estimate the difference in return value or log return probabilities (i.e., log risk ratios).

maxes

logical indicating whether analysis is for block maxima (TRUE) or block minima (FALSE); in the latter case, the function works with the negative of the values, changing the sign of the resulting location parameters

scaling

positive-valued scalar used to scale the data values for more robust optimization performance. When multiplied by the values, it should produce values with magnitude around 1.

bootSE

logical indicating whether to use the bootstrap to estimate standard errors.

bootControl

a list of control parameters for the bootstrapping. See Details.

optimArgs

a list with named components matching exactly any arguments that the user wishes to pass to R's optim function. See help(optim) for details. Of particular note, 'method' can be used to choose the optimization method used for maximizing the log-likelihood to fit the model and control=list(maxit=VALUE) for a user-specified VALUE can be used to increase the number of iterations if the optimization is converging slowly.

optimControl

a list with named components matching exactly any elements that the user wishes to pass as the control argument to R's optim function. See help(optim) for details. Primarily provided for the Python interface because control can also be passed as part of optimArgs.

missingFlag

value to be interpreted as missing values (instead of NA), intended for use in other languages (e.g., Python) calling this function

initial

a list with components named 'location', 'scale', and 'shape' providing initial parameter values, intended for use in speeding up or enabling optimization when the default initial values are resulting in failure of the optimization; note that use of scaling, logScale, and .normalizeX = TRUE cause numerical changes in some of the parameters. For example with logScale = TRUE, initial value(s) for 'scale' should be specified on the log scale.

logScale

logical indicating whether optimization for the scale parameter should be done on the log scale. By default this is FALSE when the scale is not a function of covariates and TRUE when the scale is a function of covariates (to ensure the scale is positive regardless of the regression coefficients).

.normalizeX

logical indicating whether to normalize x values for better numerical performance; default is TRUE.

.getInputs

logical indicating whether to return intermediate objects used in fitting. Defaults to FALSE and intended for internal use only

.allowNoInt

logical indicating whether no-intercept models are allowed. Defaults to TRUE and provided primarily to enable backwards compatibility with versions <= 0.2.2.

Details

This function allows one to fit stationary or nonstationary block maxima/minima models using the generalized extreme value distribution. The function can return parameter estimates, return value/level for a given return period (number of blocks), and return probabilities/periods for a given return value/level. The function provides standard errors based on the usual MLE asymptotics, with delta-method-based standard errors for functionals of the parameters, but also standard errors based on the nonparametric bootstrap, either resampling by block or by replicate or both.

Replicates:

Replicates are repeated datasets, each with the same structure, including the same number of block maxima/minima. The additional observations in multiple replicates could simply be treated as additional blocks without replication (see next paragraph), but when the covariate values and weights are the same across replicates, it is simpler to make use of nReplicates and replicateIndex.

When using multiple replicates (e.g., multiple members of a climate model initial condition ensemble), the standard input format is to append observations for additional replicates to the y argument and indicate the replicate ID for each value via replicateIndex, which would be of the form 1,1,1,...2,2,2,...3,3,3,... etc. The values for each replicate should be grouped together and in the same order within replicate so that x can be correctly matched to the y values when x is only supplied for the first replicate. In other words, y should first contain all the values for the first replicate, then all the values for the second replicate in the same block order as for the first replicate, and so forth. Note that if y is provided as a matrix with the number of rows equal to the number of observations in each replicate and the columns corresponding to replicates, this ordering will occur naturally.

However, if one has different covariate values for different replicates, then one needs to treat the additional replicates as providing additional blocks, with only a single replicate (and nReplicates set to 1). The covariate values can then be included as additional rows in x. Similarly, if there is a varying number of replicates by block, then all block-replicate pairs should be treated as individual blocks with a corresponding row in x (and nReplicates set to 1).

bootControl arguments:

The bootControl argument is a list (or dictionary when calling from Python) that can supply any of the following components:

Optimization failures:

It is not uncommon for maximization of the log-likelihood to fail for extreme value models. Users should carefully check the info element of the return object to ensure that the optimization converged. For better optimization performance, it is recommended that the observations be scaled to have magnitude around one (e.g., converting precipitation from mm to cm). When there is a convergence failure, one can try a different optimization method, more iterations, or different starting values – see optimArgs and initial. In particular, the Nelder-Mead method is used; users may want to try the BFGS method by setting optimArgs = list(method = 'BFGS') (or optimArgs = {'method': 'BFGS'} when calling from Python).

When using the bootstrap, users should check that the number of convergence failures when fitting to the boostrapped datasets is small, as it is not clear how to interpret the bootstrap results when there are convergence failures for some bootstrapped datasets.

Value

The primary outputs of this function are as follows, depending on what is requested via returnPeriod, returnValue, getParams and xContrast:

when returnPeriod is given: for the period given in returnPeriod the return value(s) (returnValue) and its corresponding asymptotic standard error (se_returnValue) and, when bootSE=TRUE, also the bootstrapped standard error (se_returnValue_boot). For nonstationary models, these correspond to the covariate values given in x.

when returnValue is given: for the value given in returnValue, the log exceedance probability (logReturnProb) and the corresponding asymptotic standard error (se_logReturnProb) and, when bootSE=TRUE, also the bootstrapped standard error (se_logReturnProb_boot). This exceedance probability is the probability of exceedance for a single block. Also returned are the log return period (logReturnPeriod) and its corresponding asymptotic standard error (se_logReturnPeriod) and, when bootSE=TRUE, also the bootstrapped standard error (se_logReturnPeriod_boot). For nonstationary models, these correspond to the covariate values given in x. Note that results are on the log scale as probabilities and return times are likely to be closer to normally distributed on the log scale and therefore standard errors are more naturally given on this scale. Confidence intervals for return probabilities/periods can be obtained by exponentiating the interval obtained from plus/minus twice the standard error of the log probabilities/periods.

when getParams=TRUE: the MLE for the model parameters (mle) and corresponding asymptotic standard error (se_mle) and, when bootSE=TRUE, also the bootstrapped standard error (se_mle_boot).

when xContrast is specified for nonstationary models: the difference in return values (returnValueDiff) and its corresponding asymptotic standard error (se_returnValueDiff) and, when bootSE=TRUE, bootstrapped standard error (se_returnValueDiff_boot). These differences correspond to the differences when contrasting each row in x with the corresponding row in xContrast. Also returned are the difference in log return probabilities (i.e., the log risk ratio) (logReturnProbDiff) and its corresponding asymptotic standard error (se_logReturnProbDiff) and, when bootSE=TRUE, bootstrapped standard error (se_logReturnProbDiff_boot).

Author(s)

Christopher J. Paciorek

References

Coles, S. 2001. An Introduction to Statistical Modeling of Extreme Values. Springer.

Paciorek, C.J., D.A. Stone, and M.F. Wehner. 2018. Quantifying uncertainty in the attribution of human influence on severe weather. Weather and Climate Extremes 20:69-80. arXiv preprint <https://arxiv.org/abs/1706.03388>.

Examples

data(Fort, package = 'extRemes')
FortMax <- aggregate(Prec ~ year, data = Fort, max)

# stationary fit
out <- fit_gev(FortMax$Prec, returnPeriod = 20, returnValue = 3.5,
        getParams = TRUE, bootSE = FALSE)

# nonstationary fit with location linear in year
out <- fit_gev(FortMax$Prec, x = data.frame(years = FortMax$year), 
        locationFun = ~years, returnPeriod = 20, returnValue = 3.5,
        getParams = TRUE, xNew = data.frame(years = range(FortMax$year)), bootSE = FALSE)

[Package climextRemes version 0.3.1 Index]