fpsdengpd {evmix}R Documentation

MLE Fitting of P-splines Density Estimate for Bulk and GPD Tail Extreme Value Mixture Model

Description

Maximum likelihood estimation for fitting the extreme value mixture model with P-splines density estimate for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.

Usage

fpsdengpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
  pvector = NULL, lambdaseq = NULL, breaks = NULL, xrange = NULL,
  nseg = 10, degree = 3, design.knots = NULL, ord = 2,
  std.err = TRUE, method = "BFGS", control = list(maxit = 10000),
  finitelik = TRUE, ...)

lpsdengpd(x, psdenx, u = NULL, sigmau = NULL, xi = 0, phiu = TRUE,
  bsplinefit = NULL, phib = NULL, log = TRUE)

nlpsdengpd(pvector, x, psdenx, phiu = TRUE, bsplinefit, phib = NULL,
  finitelik = FALSE)

proflupsdengpd(u, pvector, x, psdenx, phiu = TRUE, bsplinefit,
  method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
  ...)

nlupsdengpd(pvector, u, x, psdenx, phiu = TRUE,
  bsplinefit = bsplinefit, phib = NULL, finitelik = FALSE)

Arguments

x

vector of sample data

phiu

probability of being above threshold (0, 1) or logical, see Details in help for fnormgpd

useq

vector of thresholds (or scalar) to be considered in profile likelihood or NULL for no profile likelihood

fixedu

logical, should threshold be fixed (at either scalar value in useq, or estimated from maximum of profile likelihood evaluated at sequence of thresholds in useq)

pvector

vector of initial values of parameters or NULL for default values, see below

lambdaseq

vector of \lambda's (or scalar) to be considered in profile likelihood. Required.

breaks

histogram breaks (as in hist function)

xrange

vector of minimum and maximum of B-spline (support of density)

nseg

number of segments between knots

degree

degree of B-splines (0 is constant, 1 is linear, etc.)

design.knots

spline knots for splineDesign function

ord

order of difference used in the penalty term

std.err

logical, should standard errors be calculated

method

optimisation method (see optim)

control

optimisation control list (see optim)

finitelik

logical, should log-likelihood return finite value for invalid parameters

...

optional inputs passed to optim

psdenx

P-splines based density estimate for each datapoint in x

u

scalar threshold value

sigmau

scalar scale parameter (positive)

xi

scalar shape parameter

bsplinefit

list output from P-splines density fitting fpsden function

phib

renormalisation constant for bulk model density (1-\phi_u)/H(u), to make it integrate to 1-phiu

log

logical, if TRUE then log-likelihood rather than likelihood is output

Details

The extreme value mixture model with P-splines density estimate for bulk and GPD tail is fitted to the entire dataset. A two-stage maximum likelihood inference approach is taken. The first stage consists fitting of the P-spline density estimator, which is acheived by MLE using the fpsden function. The second stage, conditions on the B-spline coefficients, using MLE for the extreme value mixture model (GPD parameters and threshold, if requested). The estimated parameters, variance-covariance matrix and their standard errors are automatically output.

See help for fnormgpd for details of extreme value mixture models, type help fnormgpd. Only the different features are outlined below for brevity.

As the second stage conditions on the Bs-pline coefficients, the full parameter vector is (u, sigmau, xi) if threshold is also estimated and (sigmau, xi) for profile likelihood or fixed threshold approach.

(Penalized) MLE estimation of the B-Spline coefficients is carried out using Poisson regression based on histogram bin counts. See help for fpsden for details, type help fpsden.

Value

Log-likelihood is given by lpsdengpd and it's wrappers for negative log-likelihood from nlpsdengpd and nlupsdengpd. Profile likelihood for single threshold given by proflupsdengpd. Fitting function fpsdengpd returns a simple list with the following elements

call: optim call
x: data vector x
init: pvector
fixedu: fixed threshold, logical
useq: threshold vector for profile likelihood or scalar for fixed threshold
nllhuseq: profile negative log-likelihood at each threshold in useq
bsplinefit: complete fpsden output
psdenx: P-splines based density estimate for each datapoint in x
xrange: range of support of B-splines
degree: degree of B-splines
nseg: number of internal segments
design.knots: knots used in splineDesign
nbinwidth: scaling factor to convert counts to density
optim: complete optim output
conv: indicator for "possible" convergence
mle: vector of MLE of (GPD and threshold, if relevant) parameters
cov: variance-covariance matrix of MLE of parameters
se: vector of standard errors of MLE of parameters
rate: phiu to be consistent with evd
nllh: minimum negative log-likelihood
n: total sample size
beta: vector of MLE of B-spline coefficients
lambda: Estimated or fixed \lambda
u: threshold (fixed or MLE)
sigmau: MLE of GPD scale
xi: MLE of GPD shape
phiu: MLE of tail fraction (bulk model or parameterised approach)
se.phiu: standard error of MLE of tail fraction

Acknowledgments

See Acknowledgments in fnormgpd, type help fnormgpd.

The Poisson regression and leave-one-out cross-validation functions are based on the code of Eilers and Marx (1996) available from Brian Marx's website http://statweb.lsu.edu/faculty/marx/, which is gratefully acknowledged.

Note

The data are both vectors. Infinite and missing sample values are dropped.

No initial values for the coefficients are needed.

It is advised to specify the range of support xrange, using finite end-points. This is especially important when the support is bounded. By default xrange is simply the range of the input data range(x).

Further, it is advised to always set the histogram bin breaks, expecially if the support is bounded. By default 10*ln(n) equi-spaced bins are defined between xrange.

When pvector=NULL then the initial values are:

Author(s)

Alfadino Akbar and Carl Scarrott carl.scarrott@canterbury.ac.nz

References

http://www.math.canterbury.ac.nz/~c.scarrott/evmix

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

http://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://en.wikipedia.org/wiki/B-spline

http://statweb.lsu.edu/faculty/marx/

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

See Also

fpsden, fnormgpd, fgpd and gpd

Other psden: fpsden, psdengpd, psden

Other psdengpd: psdengpd, psden

Other fpsdengpd: psdengpd

Examples

## Not run: 
set.seed(1)
par(mfrow = c(1, 1))

x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)

# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)

# P-spline fitting with cubic B-splines, 2nd order penalty and 10 internal segments
# CV search for penalty coefficient. 
fit = fpsdengpd(x, useq = seq(0, 3, 0.1), fixedu = TRUE,
             lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
             xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
             
hist(x, freq = FALSE, breaks = breaks, xlim = c(-6, 6))
lines(xx, y, col = "black") # true density

# P-splines+GPD
with(fit, lines(xx, dpsdengpd(xx, beta, nbinwidth, 
                              u = u, sigmau = sigmau, xi = xi, design = design.knots),
                lwd = 2, col = "red"))
abline(v = fit$u, col = "red", lwd = 2, lty = 3)

# P-splines density estimate
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
                lwd = 2, col = "blue", lty = 2))

# vertical lines for all knots
with(fit, abline(v = design.knots, col = "red"))

# internal knots
with(fit, abline(v = design.knots[(degree + 2):(length(design.knots) - degree - 1)], col = "blue"))
  
# boundary knots (support of B-splines)
with(fit, abline(v = design.knots[c(degree + 1, length(design.knots) - degree)], col = "green"))

legend("topright", c("True Density","P-spline density","P-spline+GPD"),
  col=c("black", "blue", "red"), lty = c(1, 2, 1))
legend("topleft", c("Internal Knots", "Boundaries", "Extra Knots", "Threshold"),
  col=c("blue", "green", "red", "red"), lty = c(1, 1, 1, 2))

## End(Not run)
  

[Package evmix version 2.12 Index]