fnormgpd {evmix}R Documentation

MLE Fitting of Normal Bulk and GPD Tail Extreme Value Mixture Model

Description

Maximum likelihood estimation for fitting the extreme value mixture model with normal for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.

Usage

fnormgpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
  pvector = NULL, std.err = TRUE, method = "BFGS",
  control = list(maxit = 10000), finitelik = TRUE, ...)

lnormgpd(x, nmean = 0, nsd = 1, u = qnorm(0.9, nmean, nsd),
  sigmau = nsd, xi = 0, phiu = TRUE, log = TRUE)

nlnormgpd(pvector, x, phiu = TRUE, finitelik = FALSE)

proflunormgpd(u, pvector = NULL, x, phiu = TRUE, method = "BFGS",
  control = list(maxit = 10000), finitelik = TRUE, ...)

nlunormgpd(pvector, u, x, phiu = TRUE, finitelik = FALSE)

Arguments

x

vector of sample data

phiu

probability of being above threshold (0, 1) or logical, see Details in help for fnormgpd

useq

vector of thresholds (or scalar) to be considered in profile likelihood or NULL for no profile likelihood

fixedu

logical, should threshold be fixed (at either scalar value in useq, or estimated from maximum of profile likelihood evaluated at sequence of thresholds in useq)

pvector

vector of initial values of parameters or NULL for default values, see below

std.err

logical, should standard errors be calculated

method

optimisation method (see optim)

control

optimisation control list (see optim)

finitelik

logical, should log-likelihood return finite value for invalid parameters

...

optional inputs passed to optim

nmean

scalar normal mean

nsd

scalar normal standard deviation (positive)

u

scalar threshold value

sigmau

scalar scale parameter (positive)

xi

scalar shape parameter

log

logical, if TRUE then log-likelihood rather than likelihood is output

Details

The extreme value mixture model with normal bulk and GPD tail is fitted to the entire dataset using maximum likelihood estimation. The estimated parameters, variance-covariance matrix and their standard errors are automatically output.

The optimisation of the likelihood for these mixture models can be very sensitive to the initial parameter vector (particularly the threshold), as often there are numerous local modes where multiple thresholds give similar fits. This is an inherent feature of such models. Options are provided by the arguments pvector, useq and fixedu to implement various commonly used likelihood inference approaches for such models:

  1. (default) pvector=NULL, useq=NULL and fixedu=FALSE - to set initial value for threshold at 90% quantile along with usual defaults for other parameters as defined in Notes below. Standard likelihood optimisation is used;

  2. pvector=c(nmean, nsd, u, sigmau, xi) - where initial values of all 5 parameters are manually set. Standard likelihood optimisation is used;

  3. useq as vector - to specify a sequence of thresholds at which to evaluate profile likelihood and extract threshold which gives maximum profile likelihood; or

  4. useq as scalar - to specify a single value for threshold to be considered.

In options (3) and (4) the threshold can be treated as:

The latter approach can be used to implement the traditional fixed threshold modelling approach with threshold pre-chosen using, for example, graphical diagnostics. Further, in either such case (3) or (4) the pvector could be:

If the threshold is treated as fixed, then the likelihood is separable between the bulk and tail components. However, in practice we have found black-box optimisation of the combined likelihood works sufficiently well, so is used herein.

The following functions are provided:

The log-likelihood functions are provided for wider usage, e.g. constructing profile likelihood functions.

Defaults values for the parameter vector pvector are given in the fitting fnormgpd and profile likelihood functions proflunormgpd. The parameter vector pvector must be specified in the negative log-likelihood functions nlnormgpd and nlunormgpd. The threshold u must also be specified in the profile likelihood function proflunormgpd and nlunormgpd.

Log-likelihood calculations are carried out in lnormgpd, which takes parameters as inputs in the same form as distribution functions. The negative log-likelihood functions nlnormgpd and nlunormgpd are wrappers for likelihood function lnormgpd designed towards optimisation, i.e. nlnormgpd has vector of all 5 parameters as first input and nlunormgpd has threshold as second input and vector of remaining 4 parameters as first input. The profile likelihood function proflunormgpd has threshold u as the first input, to permit use of sapply function to evaluate profile likelihood over vector of potential thresholds.

The tail fraction phiu is treated separately to the other parameters, to allow for all it's representations. In the fitting fnormgpd and profile likelihood function proflunormgpd it is logical:

In the likelihood functions lnormgpd, nlnormgpd and nlunormgpd it can be logical or numeric:

Missing values (NA and NaN) are assumed to be invalid data so are ignored, which is inconsistent with the evd library which assumes the missing values are below the threshold.

The function lnormgpd carries out the calculations for the log-likelihood directly, which can be exponentiated to give actual likelihood using (log=FALSE).

The default optimisation algorithm is "BFGS", which requires a finite negative log-likelihood function evaluation finitelik=TRUE. For invalid parameters, a zero likelihood is replaced with exp(-1e6). The "BFGS" optimisation algorithms require finite values for likelihood, so any user input for finitelik will be overridden and set to finitelik=TRUE if either of these optimisation methods is chosen.

It will display a warning for non-zero convergence result comes from optim function call or for common indicators of lack of convergence (e.g. any estimated parameters same as initial values).

If the hessian is of reduced rank then the variance covariance (from inverse hessian) and standard error of parameters cannot be calculated, then by default std.err=TRUE and the function will stop. If you want the parameter estimates even if the hessian is of reduced rank (e.g. in a simulation study) then set std.err=FALSE.

Value

Log-likelihood is given by lnormgpd and it's wrappers for negative log-likelihood from nlnormgpd and nlunormgpd. Profile likelihood for single threshold given by proflunormgpd. Fitting function fnormgpd returns a simple list with the following elements

call: optim call
x: data vector x
init: pvector
fixedu: fixed threshold, logical
useq: threshold vector for profile likelihood or scalar for fixed threshold
nllhuseq: profile negative log-likelihood at each threshold in useq
optim: complete optim output
mle: vector of MLE of parameters
cov: variance-covariance matrix of MLE of parameters
se: vector of standard errors of MLE of parameters
rate: phiu to be consistent with evd
nllh: minimum negative log-likelihood
n: total sample size
nmean: MLE of normal mean
nsd: MLE of normal standard deviation
u: threshold (fixed or MLE)
sigmau: MLE of GPD scale
xi: MLE of GPD shape
phiu: MLE of tail fraction (bulk model or parameterised approach)
se.phiu: standard error of MLE of tail fraction

The output list has some duplicate entries and repeats some of the inputs to both provide similar items to those from fpot and increase usability.

Acknowledgments

These functions are deliberately similar in syntax and functionality to the commonly used functions in the ismev and evd packages for which their author's contributions are gratefully acknowledged.

Anna MacDonald and Xin Zhao laid some of the groundwork with programs they wrote for MATLAB.

Clement Lee and Emma Eastoe suggested providing inbuilt profile likelihood estimation for threshold and fixed threshold approach.

Note

Unlike most of the distribution functions for the extreme value mixture models, the MLE fitting only permits single scalar values for each parameter and phiu.

When pvector=NULL then the initial values are:

Avoid setting the starting value for the shape parameter to xi=0 as depending on the optimisation method it may be get stuck.

A default value for the tail fraction phiu=TRUE is given. The lnormgpd also has the usual defaults for the other parameters, but nlnormgpd and nlunormgpd has no defaults.

If the hessian is of reduced rank then the variance covariance (from inverse hessian) and standard error of parameters cannot be calculated, then by default std.err=TRUE and the function will stop. If you want the parameter estimates even if the hessian is of reduced rank (e.g. in a simulation study) then set std.err=FALSE.

Invalid parameter ranges will give 0 for likelihood, log(0)=-Inf for log-likelihood and -log(0)=Inf for negative log-likelihood.

Due to symmetry, the lower tail can be described by GPD by negating the data/quantiles.

Infinite and missing sample values are dropped.

Error checking of the inputs is carried out and will either stop or give warning message as appropriate.

Author(s)

Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz

References

http://www.math.canterbury.ac.nz/~c.scarrott/evmix

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Hu, Y. (2013). Extreme value mixture modelling: An R package and simulation study. MSc (Hons) thesis, University of Canterbury, New Zealand. http://ir.canterbury.ac.nz/simple-search?query=extreme&submit=Go

Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84(5), 1-27. doi: 10.18637/jss.v084.i05.

Behrens, C.N., Lopes, H.F. and Gamerman, D. (2004). Bayesian analysis of extreme events with threshold estimation. Statistical Modelling. 4(3), 227-244.

See Also

dnorm, fgpd and gpd

Other normgpd: fgng, fhpd, fitmnormgpd, flognormgpd, fnormgpdcon, gngcon, gng, hpdcon, hpd, itmnormgpd, lognormgpdcon, lognormgpd, normgpdcon, normgpd

Other normgpdcon: fgngcon, fhpdcon, flognormgpdcon, fnormgpdcon, gngcon, gng, hpdcon, hpd, normgpdcon, normgpd

Other gng: fgngcon, fgng, fitmgng, gngcon, gng, itmgng, normgpd

Other fnormgpd: normgpd

Examples

## Not run: 
set.seed(1)
par(mfrow = c(2, 1))

x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)

# Bulk model based tail fraction
fit = fnormgpd(x)
hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 4))
lines(xx, y)
with(fit, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="red"))
abline(v = fit$u, col = "red")
  
# Parameterised tail fraction
fit2 = fnormgpd(x, phiu = FALSE)
with(fit2, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi, phiu), col="blue"))
abline(v = fit2$u, col = "blue")
legend("topleft", c("True Density","Bulk Tail Fraction","Parameterised Tail Fraction"),
  col=c("black", "red", "blue"), lty = 1)
  
# Profile likelihood for initial value of threshold and fixed threshold approach
fitu = fnormgpd(x, useq = seq(0, 3, length = 20))
fitfix = fnormgpd(x, useq = seq(0, 3, length = 20), fixedu = TRUE)

hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 4))
lines(xx, y)
with(fit, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="red"))
abline(v = fit$u, col = "red")
with(fitu, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="purple"))
abline(v = fitu$u, col = "purple")
with(fitfix, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="darkgreen"))
abline(v = fitfix$u, col = "darkgreen")
legend("topleft", c("True Density","Default initial value (90% quantile)",
 "Prof. lik. for initial value", "Prof. lik. for fixed threshold"),
 col=c("black", "red", "purple", "darkgreen"), lty = 1)

## End(Not run)
  

[Package evmix version 2.12 Index]