| game {QRM} | R Documentation | 
Smooth Parameter Estimation and Bootstrapping of Generalized Pareto Distributions with Penalized Maximum Likelihood Estimation
Description
gamGPDfit() fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot() fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
Usage
gamGPDfit(x, threshold, nextremes = NULL, datvar, xiFrhs, nuFrhs,
          init = fit.GPD(x[,datvar], threshold = threshold,
                         type = "pwm", verbose = FALSE)$par.ests,
          niter = 32, include.updates = FALSE, eps.xi = 1e-05, eps.nu = 1e-05,
          progress = TRUE, adjust = TRUE, verbose = FALSE, ...)
gamGPDboot(x, B, threshold, nextremes = NULL, datvar, xiFrhs, nuFrhs,
           init = fit.GPD(x[,datvar], threshold = threshold,
                          type = "pwm", verbose = FALSE)$par.ests,
           niter = 32, include.updates = FALSE, eps.xi = 1e-5, eps.nu = 1e-5,
           boot.progress = TRUE, progress = FALSE, adjust = TRUE, verbose = FALSE,
           debug = FALSE, ...)
Arguments
| x | data.frame containing the losses (in some component; can be
specified with the argument  | 
| B | number of bootstrap replications. | 
| threshold | threshold of the peaks-over-threshold (POT) method. | 
| nextremes | number of excesses. This can be used to determine | 
| datvar | name of the data column in  | 
| xiFrhs | right-hand side of the formula for  | 
| nuFrhs | right-hand side of the formula for  | 
| init | bivariate vector containing initial values
for  | 
| niter | maximal number of iterations in the backfitting algorithm. | 
| include.updates | 
 | 
| eps.xi | epsilon for stop criterion for  | 
| eps.nu | epsilon for stop criterion for  | 
| boot.progress | 
 | 
| progress | 
 | 
| adjust | 
 | 
| verbose | 
 | 
| debug | 
 | 
| ... | additional arguments passed to  | 
Details
gamGPDfit() fits the parameters \xi and
\beta of the generalized Pareto distribution
\mathrm{GPD}(\xi,\beta) depending on covariates in
a non- or semiparametric way. The distribution function is given by
G_{\xi,\beta}(x)=1-(1+\xi x/\beta)^{-1/\xi},\quad
    x\ge0,
for \xi>0 (which is what we assume) and
\beta>0. Note that \beta is also denoted by
\sigma in this package. Estimation of \xi
and \beta by gamGPDfit() is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, a reparameterization of \beta in terms of the parameter
\nu is done via
\beta=\exp(\nu)/(1+\xi).
The parameters \xi and \nu (and thus
\beta) are allowed to depend on covariates (including
time) in a non- or semiparametric way, for example:
\xi=\xi(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\xi}+h_{\xi}(t),
\nu=\nu(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\nu}+h_{\nu}(t),
where \bm{x} denotes the vector of covariates,
\bm{\alpha}_{\xi}, \bm{\alpha}_{\nu}
are parameter vectors and h_{\xi}, h_{\nu} are
regression splines. For more details, see the references and the source
code.
gamGPDboot() first fits the GPD parameters via
gamGPDfit(). It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit() again.
Note that if gam() fails in gamGPDfit() or the
fitting or one of the bootstrap replications in gamGPDboot(),
then the output object contains (an) empty (sub)list(s). These
failures typically happen for too small sample sizes.
Value
gamGPDfit() returns either an empty list (list(); in
case at least one of the two gam() calls in the internal
function gamGPDfitUp() fails) or a list with the components
- xi:
- estimated parameters - \xi;
- beta:
- estimated parameters - \beta;
- nu:
- estimated parameters - \nu;
- se.xi:
- standard error for - \xi((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to- \xi) multiplied by -1;
- se.nu:
- standard error for - \nu((possibly adjusted) second-order derivative of the reparameterized log-likelihood with respect to- \nu) multiplied by -1;
- xi.covar:
- (unique) covariates for - \xi;
- nu.covar:
- (unique) covariates for - \nu;
- covar:
- available covariate combinations used for fitting - \beta(- \xi,- \nu);
- y:
- vector of excesses (exceedances minus threshold); 
- res:
- residuals; 
- MRD:
- mean relative distances between for all iterations, calculated between old parameters - (\xi, \nu)(from the last iteration) and new parameters (currently estimated ones);
- logL:
- log-likelihood at the estimated parameters; 
- xiObj:
- R object of type - gamObjectfor estimated- \xi(returned by- mgcv::gam());
- nuObj:
- R object of type - gamObjectfor estimated- \nu(returned by- mgcv::gam());
- xiUpdates:
- if - include.updatesis- TRUE, updates for- \xifor each iteration. This is a list of R objects of type- gamObjectwhich contains- xiObjas last element;
- nuUpdates:
- if - include.updatesis- TRUE, updates for- \nufor each iteration. This is a list of R objects of type- gamObjectwhich contains- nuObjas last element;
gamGPDboot() returns a list of length B+1 where
the first component contains the results of
the initial fit via gamGPDfit() and the other B
components contain the results for each replication of the
post-blackend bootstrap. Components for which gam()
fails (e.g., due to too few data) are given as empty lists (list()).
Author(s)
Marius Hofert, Valerie Chavez-Demoulin.
References
Chavez-Demoulin, V., and Davison, A. C. (2005). Generalized additive models for sample extremes. Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2014). An extreme value approach for modeling Operational Risk losses depending on covariates. Journal of Risk and Insurance; accepted.
Examples
library(QRM)
## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
                       function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)
## fit
eps <- 1e-3 # to decrease the run time for this example
require(mgcv) # due to s()
fit <- gamGPDfit(x, threshold=u, datvar="value", xiFrhs=~covar+s(time)-1,
                 nuFrhs=~covar+s(time)-1, eps.xi=eps, eps.nu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines
## grab the fitted values per group and year
xi.fit <- fitted(fit$xiObj)
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # pick fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
     main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
       legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
     main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
       legend=c("true", "fitted"), bty="n")