fpsdengpd {evmix} | R Documentation |
MLE Fitting of P-splines Density Estimate for Bulk and GPD Tail Extreme Value Mixture Model
Description
Maximum likelihood estimation for fitting the extreme value mixture model with P-splines density estimate for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.
Usage
fpsdengpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
pvector = NULL, lambdaseq = NULL, breaks = NULL, xrange = NULL,
nseg = 10, degree = 3, design.knots = NULL, ord = 2,
std.err = TRUE, method = "BFGS", control = list(maxit = 10000),
finitelik = TRUE, ...)
lpsdengpd(x, psdenx, u = NULL, sigmau = NULL, xi = 0, phiu = TRUE,
bsplinefit = NULL, phib = NULL, log = TRUE)
nlpsdengpd(pvector, x, psdenx, phiu = TRUE, bsplinefit, phib = NULL,
finitelik = FALSE)
proflupsdengpd(u, pvector, x, psdenx, phiu = TRUE, bsplinefit,
method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
...)
nlupsdengpd(pvector, u, x, psdenx, phiu = TRUE,
bsplinefit = bsplinefit, phib = NULL, finitelik = FALSE)
Arguments
x |
vector of sample data |
phiu |
probability of being above threshold |
useq |
vector of thresholds (or scalar) to be considered in profile likelihood or
|
fixedu |
logical, should threshold be fixed (at either scalar value in |
pvector |
vector of initial values of parameters or |
lambdaseq |
vector of |
breaks |
histogram breaks (as in |
xrange |
vector of minimum and maximum of B-spline (support of density) |
nseg |
number of segments between knots |
degree |
degree of B-splines (0 is constant, 1 is linear, etc.) |
design.knots |
spline knots for splineDesign function |
ord |
order of difference used in the penalty term |
std.err |
logical, should standard errors be calculated |
method |
optimisation method (see |
control |
optimisation control list (see |
finitelik |
logical, should log-likelihood return finite value for invalid parameters |
... |
optional inputs passed to |
psdenx |
P-splines based density estimate for each datapoint in x |
u |
scalar threshold value |
sigmau |
scalar scale parameter (positive) |
xi |
scalar shape parameter |
bsplinefit |
list output from P-splines density fitting |
phib |
renormalisation constant for bulk model density |
log |
logical, if |
Details
The extreme value mixture model with P-splines density estimate for bulk and GPD tail is
fitted to the entire dataset. A two-stage maximum likelihood inference approach is taken. The first
stage consists fitting of the P-spline density estimator, which is acheived by MLE using the
fpsden
function. The second stage, conditions on the B-spline coefficients,
using MLE for the extreme value mixture model (GPD parameters and threshold, if requested). The estimated
parameters, variance-covariance matrix and their standard errors are automatically
output.
See help for fnormgpd
for details of extreme value mixture models,
type help fnormgpd
. Only the different features are outlined below for brevity.
As the second stage conditions on the Bs-pline coefficients, the full parameter vector is
(u
, sigmau
, xi
) if threshold is also estimated and
(sigmau
, xi
) for profile likelihood or fixed threshold approach.
(Penalized) MLE estimation of the B-Spline coefficients is carried out using Poisson regression
based on histogram bin counts. See help for fpsden
for details,
type help fpsden
.
Value
Log-likelihood is given by lpsdengpd
and it's
wrappers for negative log-likelihood from nlpsdengpd
and nlupsdengpd
. Profile likelihood for single
threshold given by proflupsdengpd
. Fitting function
fpsdengpd
returns a simple list with the
following elements
call : | optim call |
x : | data vector x |
init : | pvector |
fixedu : | fixed threshold, logical |
useq : | threshold vector for profile likelihood or scalar for fixed threshold |
nllhuseq : | profile negative log-likelihood at each threshold in useq |
bsplinefit : | complete fpsden output |
psdenx : | P-splines based density estimate for each datapoint in x |
xrange : | range of support of B-splines |
degree : | degree of B-splines |
nseg : | number of internal segments |
design.knots : | knots used in splineDesign |
nbinwidth : | scaling factor to convert counts to density |
optim : | complete optim output |
conv : | indicator for "possible" convergence |
mle : | vector of MLE of (GPD and threshold, if relevant) parameters |
cov : | variance-covariance matrix of MLE of parameters |
se : | vector of standard errors of MLE of parameters |
rate : | phiu to be consistent with evd |
nllh : | minimum negative log-likelihood |
n : | total sample size |
beta : | vector of MLE of B-spline coefficients |
lambda : | Estimated or fixed \lambda |
u : | threshold (fixed or MLE) |
sigmau : | MLE of GPD scale |
xi : | MLE of GPD shape |
phiu : | MLE of tail fraction (bulk model or parameterised approach) |
se.phiu : | standard error of MLE of tail fraction |
Acknowledgments
See Acknowledgments in
fnormgpd
, type help fnormgpd
.
The Poisson regression and leave-one-out cross-validation functions are based on the code of Eilers and Marx (1996) available from Brian Marx's website http://statweb.lsu.edu/faculty/marx/, which is gratefully acknowledged.
Note
The data are both vectors. Infinite and missing sample values are dropped.
No initial values for the coefficients are needed.
It is advised to specify the range of support xrange
, using finite end-points. This is
especially important when the support is bounded. By default xrange
is simply the range of the
input data range(x)
.
Further, it is advised to always set the histogram bin breaks
, expecially if the support is bounded.
By default 10*ln(n)
equi-spaced bins are defined between xrange
.
When pvector=NULL
then the initial values are:
threshold 90% quantile (not relevant for profile likelihood for threshold or fixed threshold approaches);
MLE of GPD parameters above threshold.
Author(s)
Alfadino Akbar and Carl Scarrott carl.scarrott@canterbury.ac.nz
References
http://www.math.canterbury.ac.nz/~c.scarrott/evmix
http://en.wikipedia.org/wiki/Generalized_Pareto_distribution
http://en.wikipedia.org/wiki/Cross-validation_(statistics)
http://en.wikipedia.org/wiki/B-spline
http://statweb.lsu.edu/faculty/marx/
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science 11(2), 89-121.
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
See Also
fpsden
, fnormgpd
,
fgpd
and gpd
Other psden: fpsden
, psdengpd
,
psden
Other psdengpd: psdengpd
, psden
Other fpsdengpd: psdengpd
Examples
## Not run:
set.seed(1)
par(mfrow = c(1, 1))
x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)
# Plenty of histogram bins (100)
breaks = seq(-4, 4, length.out=101)
# P-spline fitting with cubic B-splines, 2nd order penalty and 10 internal segments
# CV search for penalty coefficient.
fit = fpsdengpd(x, useq = seq(0, 3, 0.1), fixedu = TRUE,
lambdaseq = 10^seq(-5, 5, 0.25), breaks = breaks,
xrange = c(-4, 4), nseg = 10, degree = 3, ord = 2)
hist(x, freq = FALSE, breaks = breaks, xlim = c(-6, 6))
lines(xx, y, col = "black") # true density
# P-splines+GPD
with(fit, lines(xx, dpsdengpd(xx, beta, nbinwidth,
u = u, sigmau = sigmau, xi = xi, design = design.knots),
lwd = 2, col = "red"))
abline(v = fit$u, col = "red", lwd = 2, lty = 3)
# P-splines density estimate
with(fit, lines(xx, dpsden(xx, beta, nbinwidth, design = design.knots),
lwd = 2, col = "blue", lty = 2))
# vertical lines for all knots
with(fit, abline(v = design.knots, col = "red"))
# internal knots
with(fit, abline(v = design.knots[(degree + 2):(length(design.knots) - degree - 1)], col = "blue"))
# boundary knots (support of B-splines)
with(fit, abline(v = design.knots[c(degree + 1, length(design.knots) - degree)], col = "green"))
legend("topright", c("True Density","P-spline density","P-spline+GPD"),
col=c("black", "blue", "red"), lty = c(1, 2, 1))
legend("topleft", c("Internal Knots", "Boundaries", "Extra Knots", "Threshold"),
col=c("blue", "green", "red", "red"), lty = c(1, 1, 1, 2))
## End(Not run)