R: MLE Fitting of P-splines Density Estimate for Bulk and GPD...

fpsdengpd {evmix}

R Documentation

MLE Fitting of P-splines Density Estimate for Bulk and GPD Tail Extreme Value Mixture Model

Description

Maximum likelihood estimation for fitting the extreme value mixture model with P-splines density estimate for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.

Usage

fpsdengpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
  pvector = NULL, lambdaseq = NULL, breaks = NULL, xrange = NULL,
  nseg = 10, degree = 3, design.knots = NULL, ord = 2,
  std.err = TRUE, method = "BFGS", control = list(maxit = 10000),
  finitelik = TRUE, ...)

lpsdengpd(x, psdenx, u = NULL, sigmau = NULL, xi = 0, phiu = TRUE,
  bsplinefit = NULL, phib = NULL, log = TRUE)

nlpsdengpd(pvector, x, psdenx, phiu = TRUE, bsplinefit, phib = NULL,
  finitelik = FALSE)

proflupsdengpd(u, pvector, x, psdenx, phiu = TRUE, bsplinefit,
  method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
  ...)

nlupsdengpd(pvector, u, x, psdenx, phiu = TRUE,
  bsplinefit = bsplinefit, phib = NULL, finitelik = FALSE)

Arguments

`x`	vector of sample data
`phiu`	probability of being above threshold `(0, 1)` or logical, see Details in help for `fnormgpd`
`useq`	vector of thresholds (or scalar) to be considered in profile likelihood or `NULL` for no profile likelihood
`fixedu`	logical, should threshold be fixed (at either scalar value in `useq`, or estimated from maximum of profile likelihood evaluated at sequence of thresholds in `useq`)
`pvector`	vector of initial values of parameters or `NULL` for default values, see below
`lambdaseq`	vector of `\lambda`'s (or scalar) to be considered in profile likelihood. Required.
`breaks`	histogram breaks (as in `hist` function)
`xrange`	vector of minimum and maximum of B-spline (support of density)
`nseg`	number of segments between knots
`degree`	degree of B-splines (0 is constant, 1 is linear, etc.)
`design.knots`	spline knots for splineDesign function
`ord`	order of difference used in the penalty term
`std.err`	logical, should standard errors be calculated
`method`	optimisation method (see `optim`)
`control`	optimisation control list (see `optim`)
`finitelik`	logical, should log-likelihood return finite value for invalid parameters
`...`	optional inputs passed to `optim`
`psdenx`	P-splines based density estimate for each datapoint in x
`u`	scalar threshold value
`sigmau`	scalar scale parameter (positive)
`xi`	scalar shape parameter
`bsplinefit`	list output from P-splines density fitting `fpsden` function
`phib`	renormalisation constant for bulk model density `(1-\phi_u)/H(u)`, to make it integrate to `1-phiu`
`log`	logical, if `TRUE` then log-likelihood rather than likelihood is output

Details

The extreme value mixture model with P-splines density estimate for bulk and GPD tail is fitted to the entire dataset. A two-stage maximum likelihood inference approach is taken. The first stage consists fitting of the P-spline density estimator, which is acheived by MLE using the fpsden function. The second stage, conditions on the B-spline coefficients, using MLE for the extreme value mixture model (GPD parameters and threshold, if requested). The estimated parameters, variance-covariance matrix and their standard errors are automatically output.

See help for fnormgpd for details of extreme value mixture models, type help fnormgpd. Only the different features are outlined below for brevity.

As the second stage conditions on the Bs-pline coefficients, the full parameter vector is (u, sigmau, xi) if threshold is also estimated and (sigmau, xi) for profile likelihood or fixed threshold approach.

(Penalized) MLE estimation of the B-Spline coefficients is carried out using Poisson regression based on histogram bin counts. See help for fpsden for details, type help fpsden.

Value

Log-likelihood is given by lpsdengpd and it's wrappers for negative log-likelihood from nlpsdengpd and nlupsdengpd. Profile likelihood for single threshold given by proflupsdengpd. Fitting function fpsdengpd returns a simple list with the following elements

`call`:	`optim` call
`x`:	data vector `x`
`init`:	`pvector`
`fixedu`:	fixed threshold, logical
`useq`:	threshold vector for profile likelihood or scalar for fixed threshold
`nllhuseq`:	profile negative log-likelihood at each threshold in useq
`bsplinefit`:	complete `fpsden` output
`psdenx`:	P-splines based density estimate for each datapoint in `x`
`xrange`:	range of support of B-splines
`degree`:	degree of B-splines
`nseg`:	number of internal segments
`design.knots`:	knots used in `splineDesign`
`nbinwidth`:	scaling factor to convert counts to density
`optim`:	complete `optim` output
`conv`:	indicator for "possible" convergence
`mle`:	vector of MLE of (GPD and threshold, if relevant) parameters
`cov`:	variance-covariance matrix of MLE of parameters
`se`:	vector of standard errors of MLE of parameters
`rate`:	`phiu` to be consistent with `evd`
`nllh`:	minimum negative log-likelihood
`n`:	total sample size
`beta`:	vector of MLE of B-spline coefficients
`lambda`:	Estimated or fixed `\lambda`
`u`:	threshold (fixed or MLE)
`sigmau`:	MLE of GPD scale
`xi`:	MLE of GPD shape
`phiu`:	MLE of tail fraction (bulk model or parameterised approach)
`se.phiu`:	standard error of MLE of tail fraction

Acknowledgments

See Acknowledgments in fnormgpd, type help fnormgpd.

The Poisson regression and leave-one-out cross-validation functions are based on the code of Eilers and Marx (1996) available from Brian Marx's website http://statweb.lsu.edu/faculty/marx/, which is gratefully acknowledged.

Note

The data are both vectors. Infinite and missing sample values are dropped.

No initial values for the coefficients are needed.

It is advised to specify the range of support xrange, using finite end-points. This is especially important when the support is bounded. By default xrange is simply the range of the input data range(x).

Further, it is advised to always set the histogram bin breaks, expecially if the support is bounded. By default 10*ln(n) equi-spaced bins are defined between xrange.

When pvector=NULL then the initial values are:

threshold 90% quantile (not relevant for profile likelihood for threshold or fixed threshold approaches);
MLE of GPD parameters above threshold.

Author(s)

Alfadino Akbar and Carl Scarrott carl.scarrott@canterbury.ac.nz

References

http://www.math.canterbury.ac.nz/~c.scarrott/evmix

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

http://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://en.wikipedia.org/wiki/B-spline

http://statweb.lsu.edu/faculty/marx/

Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Examples

## Not run: 
set.seed(1)
par(mfrow = c(1, 1))

x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)

# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)

# P-spline fitting with cubic B-splines, 2nd order penalty and 10 internal segments
# CV search for penalty coefficient. 
fit = fpsdengpd(x, useq = seq(0, 3, 0.1), fixedu = TRUE,
             lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
             xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
             
hist(x, freq = FALSE, breaks = breaks, xlim = c(-6, 6))
lines(xx, y, col = "black") # true density

# P-splines+GPD
with(fit, lines(xx, dpsdengpd(xx, beta, nbinwidth, 
                              u = u, sigmau = sigmau, xi = xi, design = design.knots),
                lwd = 2, col = "red"))
abline(v = fit$u, col = "red", lwd = 2, lty = 3)

# P-splines density estimate
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
                lwd = 2, col = "blue", lty = 2))

# vertical lines for all knots
with(fit, abline(v = design.knots, col = "red"))

# internal knots
with(fit, abline(v = design.knots[(degree + 2):(length(design.knots) - degree - 1)], col = "blue"))
  
# boundary knots (support of B-splines)
with(fit, abline(v = design.knots[c(degree + 1, length(design.knots) - degree)], col = "green"))

legend("topright", c("True Density","P-spline density","P-spline+GPD"),
  col=c("black", "blue", "red"), lty = c(1, 2, 1))
legend("topleft", c("Internal Knots", "Boundaries", "Extra Knots", "Threshold"),
  col=c("blue", "green", "red", "red"), lty = c(1, 1, 1, 2))

## End(Not run)

[Package evmix version 2.12 Index]