gamGPDfitboot {ismev} | R Documentation |
Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation
Description
gamGPDfit()
fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot()
fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
Usage
gamGPDfit(x, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs,
init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
niter=32, include.updates=FALSE, epsxi=1e-05, epsnu=1e-05,
progress=TRUE, verbose=FALSE, ...)
gamGPDboot(x, B, threshold, nexc=NULL, datvar, xiFrhs, nuFrhs,
init=gpd.fit(x[, datvar], threshold=threshold, show=FALSE)$mle[2:1],
niter=32, include.updates=FALSE, epsxi=1e-5, epsnu=1e-5,
boot.progress=TRUE, progress=FALSE, verbose=FALSE, debug=FALSE, ...)
Arguments
x |
data.frame containing the losses (in some component; can be
specified with the argument |
B |
number of bootstrap replications. |
threshold |
threshold of the peaks-over-threshold (POT) method. |
nexc |
number of excesses. This can be used to determine |
datvar |
name of the data column in |
xiFrhs |
right-hand side of the formula for |
nuFrhs |
right-hand side of the formula for |
init |
bivariate vector containing initial values
for |
niter |
maximal number of iterations in the backfitting algorithm. |
include.updates |
|
epsxi |
epsilon for stop criterion for |
epsnu |
epsilon for stop criterion for |
boot.progress |
|
progress |
|
verbose |
|
debug |
|
... |
additional arguments passed to |
Details
gamGPDfit()
fits the parameters \xi
and
\beta
of the generalized Pareto distribution
\mathrm{GPD}(\xi,\beta)
depending on covariates in
a non- or semiparametric way. The distribution function is given by
G_{\xi,\beta}(x)=1-(1+\xi x/\beta)^{-1/\xi},\quad
x\ge0,
for \xi>0
(which is what we assume) and
\beta>0
. Note that \beta
is also denoted by
\sigma
in this package. Estimation of \xi
and \beta
by gamGPDfit()
is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of \beta
in terms of the parameter
\nu
is done via
\beta=\exp(\nu)/(1+\xi).
The parameters \xi
and \nu
(and thus
\beta
) are allowed to depend on covariates (including
time) in a non- or semiparametric way, for example:
\xi=\xi(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\xi}+h_{\xi}(t),
\nu=\nu(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\nu}+h_{\nu}(t),
where \bm{x}
denotes the vector of covariates,
\bm{\alpha}_{\xi}
, \bm{\alpha}_{\nu}
are parameter vectors and h_{\xi}
, h_{\nu}
are
regression splines. For more details, see the references and the source
code.
gamGPDboot()
first fits the GPD parameters via
gamGPDfit()
. It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B
times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit()
again.
Value
gamGPDfit()
returns a list with the components
xi
:estimated parameters
\xi
;beta
:estimated parameters
\beta
;nu
:estimated parameters
\nu
;se.xi
:standard error for
\xi
((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to\xi
) multiplied by -1;se.nu
:standard error for
\nu
((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to\nu
) multiplied by -1;xi.covar
:(unique) covariates for
\xi
;nu.covar
:(unique) covariates for
\nu
;covar
:available covariate combinations used for fitting
\beta
(\xi
,\nu
);y
:vector of excesses (exceedances minus threshold);
res
:residuals;
MRD
:mean relative distances between for all iterations, calculated between old parameters
(\xi, \nu)
(from the last iteration) and new parameters (currently estimated ones);logL
:log-likelihood at the estimated parameters;
xiObj
:R object of type
gamObject
for estimated\xi
(returned bymgcv::gam()
);nuObj
:R object of type
gamObject
for estimated\nu
(returned bymgcv::gam()
);xiUpdates
:if
include.updates
isTRUE
, updates for\xi
for each iteration. This is a list of R objects of typegamObject
which containsxiObj
as last element;nuUpdates
:if
include.updates
isTRUE
, updates for\nu
for each iteration. This is a list of R objects of typegamObject
which containsnuObj
as last element;
gamGPDboot()
returns a list of length B+1
where the
first component contains the results of
the initial fit via gamGPDfit()
and the other B
components contain the results for each replication of the
post-blackend bootstrap.
Author(s)
Marius Hofert, Valerie Chavez-Demoulin.
References
Chavez-Demoulin, V., and Davison, A. C. (2005), Generalized additive models for sample extremes, Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., and Hofert, M. (to be submitted), Smooth extremal models fitted by penalized maximum likelihood estimation.
Examples
### Example 1: fitting capability ##############################################
## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)
## fit
eps <- 1e-3 # to decrease the run time for this example
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
nuFrhs=~covar+s(time)-1, epsxi=eps, epsnu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines
## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
legend=c("true", "fitted"), bty="n")
## Not run:
### Example 2: Comparison of (the more general) gamGPDfit() with gpd.fit() ########
set.seed(17) # setting seed
xi.true.A <- rep(0.4, length=nyears)
xi.true.B <- rep(0.8, length=nyears)
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
x <- data.frame(covar=covar, time=time, value=c(lossA, lossB))
## fit with gpd.fit
fit.coles <- gpd.fit(x$value, threshold=u, shl=1, sigl=1, ydat=x)
xi.fit.coles.A <- fit.coles$mle[3]+1*fit.coles$mle[4]
xi.fit.coles.B <- fit.coles$mle[3]+2*fit.coles$mle[4]
## fit with gamGPDfit()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar, nuFrhs=~covar,
epsxi=eps, epsnu=eps)
xi.fit <- fitted(fit$xiObj)
xi.fit.A <- as.numeric(xi.fit[1]) # fit for group "A"
xi.fit.B <- as.numeric(xi.fit[nyears*n+1]) # fit for group "B"
## comparison
xi.fit.A-xi.fit.coles.A
xi.fit.B-xi.fit.coles.B
## End(Not run) # dontrun