q_gpd {extremeStat} | R Documentation |
GPD quantile of sample
Description
Compute quantile of General Pareto Distribution fitted to sample by peak over threshold (POT) method using threshold from truncation proportion, comparing several R packages doing this
Usage
q_gpd(
x,
probs = c(0.8, 0.9, 0.99),
truncate = 0,
threshold = berryFunctions::quantileMean(x, truncate),
package = "extRemes",
method = NULL,
list = FALSE,
undertruncNA = TRUE,
quiet = FALSE,
ttquiet = quiet,
efquiet = quiet,
...
)
Arguments
x |
Vector with numeric values. NAs are silently ignored. |
probs |
Probabilities of truncated (Peak over threshold) quantile. DEFAULT: c(0.8,0.9,0.99) |
truncate |
Truncation percentage (proportion of sample discarded). DEFAULT: 0 |
threshold |
POT cutoff value. If you want correct percentiles, set this
only via truncate, see Details.
DEFAULT: |
package |
Character string naming package to be used. One of c("lmomco","evir","evd","extRemes","fExtremes","ismev"). DEFAULT: "extRemes" |
method |
|
list |
Return result from the fitting function with the quantiles
added to the list as element |
undertruncNA |
Return NAs for probs below truncate? Highly recommended to leave this at the DEFAULT: TRUE |
quiet |
Should messages from this function be suppressed? DEFAULT: FALSE |
ttquiet |
Should truncation!=threshold messages from this function be suppressed? DEFAULT: quiet |
efquiet |
Should warnings in function calls to the external packages be
suppressed via |
... |
Further arguments passed to the fitting function listed in section Details. |
Details
Depending on the value of "package", this fits the GPD using
lmomco::pargpa
evir::gpd
evd::fpot
extRemes::fevd
fExtremes::gpdFit
ismev::gpd.fit
Renext::Renouv
or Renext::fGPD
The method
defaults (and other possibilities) are
lmomco: none, only L-moments
evir: "pwm" (probability-weighted moments), or "ml" (maximum likelihood)
evd: none, only Maximum-likelihood fitting implemented
extRemes: "MLE", or "GMLE", "Bayesian", "Lmoments"
fExtremes: "pwm", or "mle"
ismev: none, only Maximum-likelihood fitting implemented
Renext: "r" for Renouv
(since distname.y = "gpd", evd::fpot is used),
or 'f' for fGPD
(with minimum POTs added)
The Quantiles are always given with probs
in regard to the full (uncensored) sample.
If e.g. truncate is 0.90, the distribution function is fitted to the top 10% of the sample.
The 95th percentile of the full sample is equivalent to the 50% quantile of
the subsample actually used for fitting.
For computation, the probabilities are internally updated with p2=(p-t)/(1-t)
but labeled with the original p
.
If you truncate 90% of the sample, you cannot compute the 70th percentile anymore,
thus undertruncNA
should be left to TRUE.
If not exported by the packages, the quantile functions are extracted from their source code (Nov 2016).
Value
Named vector of quantile estimates for each value of probs
,
or if(list): list with element q_gpd_quant
and info-elements added.
q_gpd_n_geq is number of values greater than or equal to q_gpd_threshold.
gt is only greater than.
Author(s)
Berry Boessenkool, berry-b@gmx.de, Feb 2016
References
https://stackoverflow.com/q/27524131, https://stats.stackexchange.com/q/129438
See Also
distLquantile
which compares results for all packages
Other related packages (not implemented):
https://cran.r-project.org/package=gPdtest
https://cran.r-project.org/package=actuar
https://cran.r-project.org/package=fitdistrplus
https://cran.r-project.org/package=lmom
Examples
data(annMax)
q_gpd(annMax)
q_gpd(annMax, truncate=0.6)
q_gpd(annMax, truncate=0.85)
q_gpd(annMax, truncate=0.91)
q_gpd(annMax, package="evir")
q_gpd(annMax, package="evir", method="ml")
q_gpd(annMax, package="evd")
q_gpd(annMax, package="extRemes")
q_gpd(annMax, package="extRemes", method="GMLE")
#q_gpd(annMax, package="extRemes", method="Bayesian") # computes a while
q_gpd(annMax, package="extRemes", method="Lmoments")
q_gpd(annMax, package="extRemes", method="nonsense") # NAs
q_gpd(annMax, package="fExtremes") # log warnings
q_gpd(annMax, package="fExtremes", efquiet=TRUE) # silenced warnings
q_gpd(annMax, package="fExtremes", method= "mle")
q_gpd(annMax, package="ismev")
q_gpd(annMax, package="Renext")
q_gpd(annMax, package="Renext", method="f")
berryFunctions::is.error(q_gpd(annMax, package="nonsense"), force=TRUE)
# compare all at once with
d <- distLquantile(annMax); d
# d <- distLquantile(annMax, speed=FALSE); d # for Bayesian also
q_gpd(annMax, truncate=0.85, package="evd") # Note about quantiles
q_gpd(annMax, truncate=0.85, package="evir")
q_gpd(annMax, truncate=0.85, package="evir", quiet=TRUE) # No note
q_gpd(annMax, truncate=0.85, package="evir", undertruncNA=FALSE)
q_gpd(annMax, truncate=0.85, package="evir", list=TRUE)
str( q_gpd(annMax, truncate=0.85, probs=0.6, package="evir", list=TRUE) )# NAs
str( q_gpd(annMax, package="evir", list=TRUE) )
str( q_gpd(annMax, package="evd", list=TRUE) )
str( q_gpd(annMax, package="extRemes", list=TRUE) )
str( q_gpd(annMax, package="fExtremes", list=TRUE) )
str( q_gpd(annMax, package="ismev", list=TRUE) )
str( q_gpd(annMax, package="Renext", list=TRUE) )
q_gpd(annMax, package="evir", truncate=0.9, method="ml") # NAs (MLE fails often)
trunc <- seq(0,0.9,len=500)
library("pbapply")
quant <- pbsapply(trunc, function(tr) q_gpd(annMax, pack="evir", method = "pwm",
truncate=tr, quiet=TRUE))
quant <- pbsapply(trunc, function(tr) q_gpd(annMax, pack="lmomco", truncate=tr, quiet=TRUE))
plot(trunc, quant["99%",], type="l", ylim=c(80,130), las=1)
lines(trunc, quant["90%",])
lines(trunc, quant["80%",])
plot(trunc, quant["RMSE",], type="l", las=1)
## Not run:
## Not run in checks because simulation takes too long
trunc <- seq(0,0.9,len=200)
dlfs <- pblapply(trunc, function(tr) distLfit(annMax, truncate=tr, quiet=TRUE, order=FALSE))
rmses <- sapply(dlfs, function(x) x$gof$RMSE)
plot(trunc, trunc, type="n", ylim=range(rmses,na.rm=TRUE), las=1, ylab="rmse")
cols <- rainbow2(17)[rank(rmses[,1])]
for(i in 1:17) lines(trunc, rmses[i,], col=cols[i])
dlfs2 <- lapply(0:8/10, function(tr) distLfit(annMax, truncate=tr, quiet=TRUE))
pdf("dummy.pdf")
dummy <- sapply(dlfs2, function(x)
{plotLfit(x, cdf=TRUE, main=x$truncate, ylim=0:1, xlim=c(20,135), nbest=1)
title(sub=round(x$gof$RMSE[1],4))
})
dev.off()
# truncation effect
mytruncs <- seq(0, 0.9, len=150)
oo <- options(show.error.messages=FALSE, warn=-1)
myquants <- sapply(mytruncs, function(t) q_gpd(annMax, truncate=t, quiet=TRUE))
options(oo)
plot(1, type="n", ylim=range(myquants, na.rm=TRUE), xlim=c(0,0.9), las=1,
xlab="truncated proportion", ylab="estimated quantiles")
abline(h=quantileMean(annMax, probs=c(0.8,0.9,0.99)))
for(i in 1:3) lines(mytruncs, myquants[i,], col=i)
text(0.3, c(87,97,116), rownames(myquants), col=1:3)
# Underestimation in small samples
# create known population:
dat <- extRemes::revd(1e5, scale=50, shape=-0.02, threshold=30, type="GP")
op <- par(mfrow=c(1,2), mar=c(2,2,1,1))
hist(dat, breaks=50, col="tan")
berryFunctions::logHist(dat, breaks=50, col="tan")
par(op)
# function to estimate empirical and GPD quantiles from subsamples
samsizeeffect <- function(n, nrep=30, probs=0.999, trunc=0.5, Q=c(0.4,0.5,0.6))
{
res <- replicate(nrep, {
subsample <- sample(dat, n)
qGPD <- q_gpd(subsample, probs=probs, truncate=trunc)
qEMP <- berryFunctions::quantileMean(subsample, probs=probs, truncate=trunc)
c(qGPD=qGPD, qEMP=qEMP)})
apply(res, MARGIN=1, berryFunctions::quantileMean, probs=Q)
}
# Run and plot simulations
samplesize <- c(seq(20, 150, 10), seq(200,800, 100))
results <- pbapply::pblapply(samplesize, samsizeeffect)
res <- function(row, col) sapply(results, function(x) x[row,col])
berryFunctions::ciBand(yu=res(3,1),yl=res(1,1),ym=res(2,1),x=samplesize,
main="99.9% Quantile underestimation", xlab="subsample size", ylim=c(200,400), colm=4)
berryFunctions::ciBand(yu=res(3,2),yl=res(1,2),ym=res(2,2),x=samplesize, add=TRUE)
abline(h=berryFunctions::quantileMean(dat, probs=0.999))
text(300, 360, "empirical quantile of full sample")
text(300, 340, "GPD parametric estimate", col=4)
text(300, 300, "empirical quantile estimate", col="green3")
## End(Not run) # end of dontrun