R: Estimate a Confidence Interval for a Single Quantile of a...

qua2ci.simple {lmomco}

R Documentation

Estimate a Confidence Interval for a Single Quantile of a Parent Distribution by a Simple Algorithm

Description

This function estimates the lower and upper limits of a specified confidence interval for an aribitrary quantile value of a specified parent distribution [quantile function Q(F,\theta) with parameters \theta] using Monte Carlo simulation. The quantile value, actually the nonexceedance probability (F for 0 \le F \le 1) of the value, is specified by the user. The user also provides the parameters of the parent distribution (see lmom2par). This function does consider an estimate of the variance-covariance structure of the sample data (for that see qua2ci.cov). The qua2ci.simple is the original implementation and dates close to the initial releases of lmomco and was originally named qua2ci. That name is now deprecated but retained as an alias, which will be removed at some later release.

For nsim simulation runs (ideally a large number), samples of size n are drawn from Q(F,\theta). The L-moments of each simulated sample are computed using lmoms and a distribution of the same type is fit. The F-quantile of the just-fitted distribution is computed and placed into a vector. The process of simulating the sample, computing the L-moments, computing the parameters, and solving for the F-quantile is repeated for the specified number of simulation runs.

To estimate the confidence interval, the L-moments of the vector simulated quantiles are computed. Subsequently, the parameters of a user-specified distribution “error” distribution (edist) are computed. The two quantiles of this error distribution for the specified confidence interval are computed. These two quantiles represent the estimated lower and upper limits for the confidence interval of the parent distribution for samples of size n. The error distribution defaults to the Generalized Normal (see pargno) because this distribution has the Normal as a special case but extends the fit to the 3rd L-moment (\tau_3) for exotic situations in which some asymmetry in the quantile distribution might exist.

Finally, it is often useful to have vectors of lower and upper limits for confidence intervals for a vector of F values. The function genci.simple does just that and uses qua2ci.simple as the computational underpinning.

Usage

qua2ci.simple(f,para,n, level=0.90, edist="gno", nsim=1000, showpar=FALSE,
                        empdist=TRUE, verbose=FALSE, maxlogdiff=6, ...)

Arguments

`f`	Nonexceedance probability (`0 \le F \le 1`) of the quantile for which the confidence interval is needed. This function is not vectorized and therefore only the first value will be used. This is in contrast to the vectorization of `F` in the conceptually similar function `qua2ci.cov`.
`para`	The parameters from `lmom2par` or `vec2par`—these parameters represent the “true” parent.
`n`	The sample size for each Monte Carlo simulation will use.
`level`	The confidence interval (`0 \le` `level` `< 1`). The interval is specified as the size of the interval. The default is 0.90 or the 90th percentile. The function will return the 5th [`(1-0.90)/2`] and 95th [`(1-(1-0.90)/2)`] percentile cumulative probability of the simulated quantile distribution as specified by the nonexceedance probability argument. The arguments `level` and `f` therefore are separate features.
`edist`	The model for the error distribution. Although the Normal (the default) commonly is assumed in error analyses, it need not be, as support for other distributions supported by lmomco is available. The default is the Generalized Normal so the not only is the Normal possible but asymmetry is also accomodated (`lmomgno`). For example, if the L-skew (`\tau_4`) or L-kurtosis (`\tau_4`) values depart considerably from those of the Normal (`\tau_3 = 0` and `\tau_4 = 0.122602`), then the Generalized Normal or some alternative distribution would likely provide more reliable confidence interval estimation.
`nsim`	The number of simulations (replications) for the sample size `n` to perform. Large numbers produce more refined confidence limit estimates at the cost of CPU time. The default is anticipated to be large enough for evaluative-useage without too much computational delay. Larger simulation numbers are recommended.
`showpar`	The parameters of the `edist` for each simulation are printed.
`empdist`	If `TRUE`, then an R `environment` is appended onto the element `empdist` in the returned list, otherwise `empdist` is `NA`.
`verbose`	The verbosity of the operation of the function.
`maxlogdiff`	The maximum permitted difference in log10 space between a simulated quantile and the true value. It is possible that a well fit simulated sample to the parent distribution type provides crazy quantile estimates in the far reaches of either tail. The default value of 6 was chosen based on experience with the Kappa distribution fit to a typical heavy-right tail flood magnitude data set. The concern motivating this feature is that as the number of parameters increases, it seems progressively there is more chance for a distribution tail to swing wildy into regions for which an analyst would not be comfortable with given discipline-specific knowledge. The choice of 6-log cycles is ad hoc at best, and users are encouraged to do their own exploration. If `verbose=TRUE` then a message will be printed when the `maxlogdiff` condition is tripped.
`...`	Additional arguments to pass such as to `lmom2par`.

Value

An R list is returned. The lwr and upr match the nomenclature of qua2ci.cov but because qua2ci.simple is provided the parent, the true value is returned, whereas qua2ci.cov returns the fit.

`lwr`	The lower value of the confidence interval having nonexceedance probability equal to `(1-level)/2`.
`true`	The value returned by `par2qua(f,para)`.
`upr`	The upper value of the confidence interval having nonexceedance probability equal to `1-(1-level)/2`.
`elmoms`	The L-moments from `lmoms` of the distribution of simulated of quantiles.
`epara`	The parameters of the error distribution fit using the `elmoms`.
`empdist`	An R `environment` (see below).
`ifail`	A diagnostic value. A value of zero means that successful exit was made.
`ifailtext`	A descriptive message related to the `ifail` value.
`nsim`	An echoing of the `nsim` argument for the function.
`sim.attempts`	The number of executions of the `while` loop (see Note below).

The empdist element in the returned list is an R environment that contains:

`simquas`	A `nsim`-long vector of the simulated quantiles for `f`.
`empir.dist.lwr`	The lower limit derived from the R `quantile` function for `type=6`, which uses `i/(n+1)`.
`empir.dist.upr`	The upper limit derived from the R `quantile` function for `type=6`, which uses `i/(n+1)`.
`bern.smooth.lwr`	The lower limit estimated by the Bernstein smoother in `dat2bernqua` for `poly.type = "Bernstein"` and `bound.type = "none"`.
`bern.smooth.upr`	The upper limit estimated by the Bernstein smoother in `dat2bernqua` for `poly.type = "Bernstein"` and `bound.type = "none"`.
`epmoms`	The product moments of the simulated quantiles from `pmoms`.

Note

This function relies on a while loop that runs until nsim have successfully completed. Some reasons for an early next in the loop include invalid L-moments by are.lmom.valid of the simluated data or invalid fitted parameters by are.par.valid to simulated L-moments. See the source code for more details.

Author(s)

W.H. Asquith

Examples

## Not run: 
# It is well known that standard deviation (sigma) of the
# sample mean is equal to sigma/sample_size. Let is look at the
# quantile distribution of the median (f=0.5)
mean   <- 0; sigma <- 100
parent <- vec2par(c(mean,sigma), type='nor')
CI     <- qua2ci.simple(0.5, parent, n=10, nsim=20)
# Theoretrical sample mean sigma = 100/10 = 10
# L-moment theory: L-scale * sqrt(pi) = sigma
# Thus, it follows that the quantity
CI$elmoms$lambdas[2]/sqrt(pi)
# approaches 10 as nsim --> Inf.
## End(Not run)

# Another example.
D   <- c(123, 34, 4, 654, 37, 78, 93, 95, 120) # fake sample
lmr <- lmoms(D)    # compute the L-moments of the data
WEI <- parwei(lmr) # estimate Weibull distribution parameters
CI  <- qua2ci.simple(0.75,WEI,20, nsim=20, level=0.95)
# CI contains the estimate 95-percent confidence interval for the
# 75th-percentile of the parent Weibull distribution for 20 sample size 20.
## Not run: 
pdf("Substantial_qua2ci_example.pdf")
level <- 0.90; cilo <- (1-level)/2; cihi <- 1 - cilo
para <- lmom2par(vec2lmom(c(180,50,0.75)), type="gev")
A <- qua2ci.simple(0.98, para, 30, edist="gno", level=level, nsim=3000)
Apara <- A$epara; Aenv <- A$empdist
Bpara <- lmom2par(A$elmoms, type="aep4")

lo <- log10(A$lwr); hi <- log10(A$upr)
xs <- 10^(seq(lo-0.2, hi+0.2, by=0.005))
lo <- A$lwr; hi <- A$upr; xm <- A$true; sbar <- mean(Aenv$simquas)
dd <- density(Aenv$simquas, adjust=0.5)
pk <- max(dd$y, dlmomco(xs, Apara), dlmomco(xs, Bpara))
dx <- dd$x[dd$x >= Aenv$empir.dist.lower & dd$x <= Aenv$empir.dist.upper]
dy <- dd$y[dd$x >= Aenv$empir.dist.lower & dd$x <= Aenv$empir.dist.upper]
dx <- c(dx[1], dx, dx[length(dx)]); dy <- c(0, dy, 0)

plot(c(0), c(0), type="n", xlim=range(xs), ylim=c(0,pk),
                 xlab="X VALUE", ylab="PROBABILITY DENSITY")
polygon(dx, dy, col=8)
lines(xs, dlmomco(xs, Apara)); lines(xs, dlmomco(xs, Bpara), col=2, lwd=2)
lines(dd, lty=2, lwd=2, col=8)
lines(xs, dlmomco(xs, para), col=6); lines(c(xm,xm), c(0,pk), lty=4, lwd=3)
lines(c(lo,lo,NA,hi,hi), c(0,pk,NA,0,pk), lty=2)

xlo <- qlmomco(cilo, Apara); xhi <- qlmomco(cihi, Apara)
points(c(xlo, xhi), c(dlmomco(xlo, Apara), dlmomco(xhi, Apara)), pch=16)
xlo <- qlmomco(cilo, Bpara); xhi <- qlmomco(cihi, Bpara)
points(c(xlo, xhi), c(dlmomco(xlo, Bpara), dlmomco(xhi, Bpara)), pch=16, col=2)
lines(rep(Aenv$empir.dist.lwr, 2), c(0,pk), lty=3, lwd=2, col=3)
lines(rep(Aenv$empir.dist.upr, 2), c(0,pk), lty=3, lwd=2, col=3)
lines(rep(Aenv$bern.smooth.lwr,2), c(0,pk), lty=3, lwd=2, col=4)
lines(rep(Aenv$bern.smooth.upr,2), c(0,pk), lty=3, lwd=2, col=4)
cat(c(  "F(true) = ",             round(plmomco(xm,   Apara), digits=2),
      "; F(mean(sim), edist) = ", round(plmomco(sbar, Apara), digits=2), "\n"), sep="")
dev.off()
## End(Not run)
## Not run: 
ty <- "nor" # try running with "glo" (to get the L-skew "fit", see below)
para <- lmom2par(vec2lmom(c(-180,70,-.5)), type=ty)
f <- 0.99; n <- 41; ns <- 1000; Qtrue <- qlmomco(f, para)
Qsim1 <- replicate(ns, qlmomco(f, lmom2par(lmoms(rlmomco(n, para)), type=ty)))
Qsim2 <- qua2ci.simple(f, para, n, nsim=ns, edist="gno")
Qbar1 <- mean(Qsim1); Qbar2 <- mean(Qsim2$empdist$simquas)
epara <- Qsim2$epara; FT <- plmomco(Qtrue, epara)
F1 <- plmomco(Qbar1, epara); F2 <- plmomco(Qbar2, epara)
cat(c(  "F(true) = ",      round(FT, digits=2),
      "; F(via sim.) = ",  round(F1, digits=2),
      "; F(via edist) = ", round(F2, digits=2), "\n"), sep="")
# The given L-moments are highly skewed, but a Normal distribution is fit so
# L-skew is ignored. The game is deep tail (f=0.99) estimation. The true value of the
# quantile has a percentile on the error distribution 0.48 that is almost exactly 0.5
# (median = mean = symmetrical error distribution).  A test run shows nice behavior:
# F(true) =  0.48; F(via sim.) =  0.49; F(via edist) =  0.5
# But another run with ty <- "glo" (see how 0.36 << [0.52, 0.54]) has
# F(true) =  0.36; F(via sim.) =  0.54; F(via edist) =  0.52
# So as the asymmetry becomes extreme, the error distribution becomes asymmetrical too.
## End(Not run)

[Package lmomco version 2.5.1 Index]