fgpd {evmix} | R Documentation |
MLE Fitting of Generalised Pareto Distribution (GPD)
Description
Maximum likelihood estimation for fitting the GPD with
parameters scale sigmau
and shape xi
to the threshold
exceedances, conditional on being above a threshold u
. Unconditional
likelihood fitting also provided when the probability phiu
of being
above the threshold u
is given.
Usage
fgpd(x, u = 0, phiu = NULL, pvector = NULL, std.err = TRUE,
method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
...)
lgpd(x, u = 0, sigmau = 1, xi = 0, phiu = 1, log = TRUE)
nlgpd(pvector, x, u = 0, phiu = 1, finitelik = FALSE)
Arguments
x |
vector of sample data |
u |
scalar threshold |
phiu |
probability of being above threshold |
pvector |
vector of initial values of GPD parameters ( |
std.err |
logical, should standard errors be calculated |
method |
optimisation method (see |
control |
optimisation control list (see |
finitelik |
logical, should log-likelihood return finite value for invalid parameters |
... |
optional inputs passed to |
sigmau |
scalar scale parameter (positive) |
xi |
scalar shape parameter |
log |
logical, if |
Details
The GPD is fitted to the exceedances of the threshold u
using
maximum likelihood estimation. The estimated parameters,
variance-covariance matrix and their standard errors are automatically
output.
The log-likelihood and negative log-likelihood are also provided for wider
usage, e.g. constructing your own extreme value mixture model or profile
likelihood functions. The
parameter vector pvector
must be specified in the negative
log-likelihood nlgpd
.
Log-likelihood calculations are carried out in
lgpd
, which takes parameters as inputs in the
same form as distribution functions. The negative log-likelihood is a
wrapper for lgpd
, designed towards making it
useable for optimisation (e.g. parameters are given a vector as first
input).
The default value for the tail fraction phiu
in the fitting function
fgpd
is NULL
, in which case the MLE is calculated
using the sample proportion of exceedances. In this case the standard error for phiu
is
estimated and output as se.phiu
, otherwise it is set to NA
. Consistent with the
evd
library the missing values (NA
and
NaN
) are assumed to be below the threshold in calculating the tail fraction.
Otherwise, in the fitting function fgpd
the tail
fraction phiu
can be specified as any value over (0, 1]
, i.e.
excludes \phi_u=0
, leading to the unconditional log-likelihood being
used for estimation. In this case the standard error will be output as NA
.
In the log-likelihood functions lgpd
and
nlgpd
the tail fraction phiu
cannot be
NULL
but can be over the range [0, 1]
, i.e. which includes
\phi_u=0
.
The value of phiu
does not effect the GPD parameter estimates, only
the value of the likelihood, as:
L(\sigma_u, \xi; u, \phi_u) = (\phi_u ^ {n_u}) L(\sigma_u, \xi; u,
\phi_u=1)
where the GPD has scale \sigma_u
and shape \xi
, the threshold
is u
and nu
is the number of exceedances. A non-unit value for
phiu
simply scales the likelihood and shifts the log-likelihood,
thus the GPD parameter estimates are invariant to phiu
.
The default optimisation algorithm is "BFGS", which requires a finite
negative log-likelihood function evaluation finitelik=TRUE
. For
invalid parameters, a zero likelihood is replaced with exp(-1e6)
.
The "BFGS" optimisation algorithms require finite values for likelihood, so
any user input for finitelik
will be overridden and set to
finitelik=TRUE
if either of these optimisation methods is chosen.
It will display a warning for non-zero convergence result comes from
optim
function call.
If the hessian is of reduced rank then the variance covariance (from
inverse hessian) and standard error of parameters cannot be calculated,
then by default std.err=TRUE
and the function will stop. If you want
the parameter estimates even if the hessian is of reduced rank (e.g. in a
simulation study) then set std.err=FALSE
.
Value
lgpd
gives (log-)likelihood and
nlgpd
gives the negative log-likelihood.
fgpd
returns a simple list with the following
elements
call : | optim call |
x : | data vector x |
init : | pvector |
optim : | complete optim output |
mle : | vector of MLE of parameters |
cov : | variance-covariance matrix of MLE of parameters |
se : | vector of standard errors of MLE of parameters |
rate : | phiu to be consistent with evd |
nllh : | minimum negative log-likelihood |
n : | total sample size |
u : | threshold |
sigmau : | MLE of GPD scale |
xi : | MLE of GPD shape |
phiu : | MLE of tail fraction |
se.phiu : | standard error of MLE of tail fraction (parameterised approach using sample proportion) |
The output list has some duplicate entries and repeats some of the inputs to both
provide similar items to those from fpot
and increase usability.
Acknowledgments
Based on the gpd.fit
and
fpot
functions in the
ismev
and
evd
packages for which their author's contributions are gratefully acknowledged.
They are designed to have similar syntax and functionality to simplify the transition for users of these packages.
Note
Unlike all the distribution functions for the GPD, the MLE fitting only
permits single scalar values for each parameter, phiu
and threshold
u
.
When pvector=NULL
then the initial values are calculated, type
fgpd
to see the default formulae used. The GPD fitting is not very
sensitive to the initial values, so you will rarely have to give
alternatives. Avoid setting the starting value for the shape parameter to
xi=0
as depending on the optimisation method it may be get stuck.
Default values for the threshold u=0
and tail fraction
phiu=NULL
are given in the fitting fpgd
,
in which case the MLE assumes that excesses over the threshold are given,
rather than exceedances.
The usual default of phiu=1
is given in the likelihood functions
lpgd
and nlpgd
.
The lgpd
also has the usual defaults for the
other parameters, but nlgpd
has no defaults.
Infinite sample values are dropped in fitting function
fpgd
, but missing values are used to estimate
phiu
as described above. But in likelihood functions
lpgd
and nlpgd
both
infinite and missing values are ignored.
Error checking of the inputs is carried out and will either stop or give warning message as appropriate.
Author(s)
Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz
References
http://en.wikipedia.org/wiki/Generalized_Pareto_distribution
Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84(5), 1-27. doi: 10.18637/jss.v084.i05.
See Also
Other gpd: gpd
Other fgpd: gpd
Examples
set.seed(1)
par(mfrow = c(2, 1))
# GPD is conditional model for threshold exceedances
# so tail fraction phiu not relevant when only have exceedances
x = rgpd(1000, u = 10, sigmau = 5, xi = 0.2)
xx = seq(0, 100, 0.1)
hist(x, breaks = 100, freq = FALSE, xlim = c(0, 100))
lines(xx, dgpd(xx, u = 10, sigmau = 5, xi = 0.2))
fit = fgpd(x, u = 10)
lines(xx, dgpd(xx, u = fit$u, sigmau = fit$sigmau, xi = fit$xi), col="red")
# but tail fraction phiu is needed for conditional modelling of population tail
x = rnorm(10000)
xx = seq(-4, 4, 0.01)
hist(x, breaks = 200, freq = FALSE, xlim = c(0, 4))
lines(xx, dnorm(xx), lwd = 2)
fit = fgpd(x, u = 1)
lines(xx, dgpd(xx, u = fit$u, sigmau = fit$sigmau, xi = fit$xi, phiu = fit$phiu),
col = "red", lwd = 2)
legend("topright", c("True Density","Fitted Density"), col=c("black", "red"), lty = 1)