pfactor.bernstein {lmomco} | R Documentation |
Estimation of Optimal p-factor of Distributional Support Estimation for Smoothed Quantiles from the Bernstein or Kantorovich Polynomials
Description
Compute the optimal p-factor through numerical integration of the smoothed empirical quantile function to estimate the L-moments of the distribution. This function attempts to report an optimal “p-factor” (author's term) for the given parent distribution in para
based on estimating the crossing of the origin of an error between the given L-moment ratio \tau_r
for 3, 4, and 5 that will come from either the distribution parameter object or given as an argument in lmr.dist
. The estimated support of the distribution is that shown by Turnbull and Ghosh (2014) and is computed as follows
\biggl(x_{0:n},\: x_{n+1:n}\biggr) = \biggl(x_{1:n} - \frac{(x_{2:n} - x_{1:n})}{(1 - p)^{-2} - 1},\: x_{n:n} + \frac{(x_{n:n} - x_{n-1:n})}{(1 - p)^{-2} - 1}\biggr)\mbox{,}
where p
is the p-factor. The support will honor natural bounds if given by either fix.lower
or fix.upper
. The polynomial type for smooth is provided in poly.type
. These three arguments are the same as those for dat2bernqua
and lmoms.bernstein
. The statistic type used to measure central tendency of the errors for the nsim
simulations per p
. The function has its own hardwired p-factors to compute but these can be superseded by the pfactors
argument. The p.lo
and p.hi
are the lower and upper bounds to truncate on immediately after the p-factors to use are assembled. These are made for three purposes: (1) protection against numerical problems for mathematical upper limits (unity), (2) to potentially provide for much faster execution if the user already knows the approximate optimal value for the p-factor, and (3) to potentially use this function in a direct optimization framework using the R functions optim
or uniroot
. It is strongly suggested to keep plot.em
set so the user can inspect the computations.
Usage
pfactor.bernstein(para, x=NULL, n=NULL,
bern.control=NULL,
poly.type=c("Bernstein", "Kantorovich"),
stat.type=c("Mean", "Median"),
fix.lower=NULL, fix.upper=NULL,
lmr.dist=NULL, lmr.n=c("3", "4", "5"),
nsim=500, plot.em=TRUE, pfactors=NULL,
p.lo=.Machine$double.eps, p.hi=1)
Arguments
para |
A mandatory “parent” distribution defined by a usual lmomco distribution parameter object for a distribution. The simulations are based on this distribution, although optimization for |
x |
An optional vector of data values. |
n |
An optional sample size to run the simulations on. This value is computed by |
bern.control |
A |
poly.type |
Same argument as for |
stat.type |
The central estimation statistic for each p-factor evaluated. |
fix.lower |
Same argument as for |
fix.upper |
Same argument as for |
lmr.dist |
This is the value for the |
lmr.n |
The L-moment ratio number for p-factor optimization. |
nsim |
The number of simulations to run. Experiments suggest the default is adequate for reasonably small sample sizes—the simulation count can be reduced as |
plot.em |
A logical to trigger the diagnostic plot of the simulated errors and a smooth line through these errors. |
pfactors |
An optional vector of p-factors to loop through for the simulations. The vector computing internall is this is set to |
p.lo |
An computational lower boundary for which the |
p.hi |
An computational upper boundary for which the |
Value
An R list
or real
is returned. If pfactors
is a single value, then the single value for the error statistic is returned, otherwise the list described will be. If the returned pfactor
is NA
, then likely the smooth line did not cross zero and the reason the user should keep plot.em=TRUE
and inspect the plot. Perhaps revisions to the arguments will become evident. The contents of the list are
pfactor |
The estimated value of |
bounds.type |
|
poly.type |
The given |
stat.type |
The given |
lmom.type |
A string of the L-moment type: “Tau3”, “Tau4”, “Tau5”. |
fix.lower |
The given fixed lower boundary, which could stay |
fix.upper |
The given fixed upper boundary, which could stay |
source |
An attribute identifying the computational source of the L-moments: “pfactor.bernstein”. |
ps |
The p-factors actually evaluated. |
err.stat |
The error statistic computed by |
err.smooth |
The |
Note
Repeated application of this function for various n
would result in the analyst having a vector of n
and p
(pfactor
). The analyst could then fit a regression equation and refine the estimated p(n)
. For example, a dual-logarithmic regression is suggested lm(log(p)~log(n))
.
Also, symmetrical data likely see little benefit from optimizing on the symmetry-measuring L-moments Tau3 and Tau5; the analyst might prefer to optimize on peakedness measured by Tau4.
Note
This function is highly experimental and subject to extreme overhaul. Please contact the author if you are an interested party in Bernstein and Kantorovich polynomials.
Author(s)
W.H. Asquith
References
Turnbull, B.C., and Ghosh, S.K., 2014, Unimodal density estimation using Bernstein polynomials. Computational Statistics and Data Analysis, v. 72, pp. 13–29.
See Also
lmoms.bernstein
, dat2bernqua
, lmoms
Examples
## Not run:
pdf("pfactor_exampleB.pdf")
X <- exp(rnorm(200)); para <- parexp(lmoms(X))
# nsim is too small, but makes the following three not take too long
pfactor.bernstein(para, n=20, lmr.n="3", nsim=100, p.lo=.06, p.hi=.3)
pfactor.bernstein(para, n=20, lmr.n="4", nsim=100, p.lo=.06, p.hi=.3)
pfactor.bernstein(para, n=20, lmr.n="5", nsim=100, p.lo=.06, p.hi=.3)
dev.off()
## End(Not run)
## Not run:
# Try intra-sample p-factor optimization from two perspectives. The 3-parameter
# GEV "over fits" the data and provides the parent. Then use Tau3 of the fitted
# GEV for peakedness restraint and then use Tau3 of the data. Then repeat but use
# the apparent "exact" value of Tau3 for the true exponential parent.
pdf("pfactor_exampleB.pdf")
lmr <- vec2lmom(c(60,20)); paraA <- parexp(lmr); n <- 40
tr <- lmorph(par2lmom(paraA))$ratios[3]
X <- rlmomco(n, paraA); para <- pargev(lmoms(X))
F <- seq(0.001,0.999, by=0.001)
plot(qnorm(pp(X, a=0.40)), sort(X), type="n", log="y",
xlab="Standard normal variate", ylab="Quantile",
xlim=qnorm(range(F)), ylim=range(qlmomco(F,paraA)))
lines(qnorm(F), qlmomco(F, paraA), col=8, lwd=2)
lines(qnorm(F), qlmomco(F, para), lty=2)
points(qnorm(pp(X, a=0.40)), sort(X))
# Make sure to fill in the p-factor when needed!
bc <- list(poly.type = "Bernstein", bound.type="Carv",
stat.type="Mean", fix.lower=0, fix.upper=NULL, p=NULL)
kc <- list(poly.type = "Kantorovich", bound.type="Carv",
stat.type="Mean", fix.lower=0, fix.upper=NULL, p=NULL)
# Bernstein
A <- pfactor.bernstein(para, n=n, nsim=100, bern.control=bc)
B <- pfactor.bernstein(para, x=X, n=n, nsim=100, bern.control=bc)
C <- pfactor.bernstein(para, n=n, nsim=100, lmr.dist=tr, bern.control=bc)
D <- pfactor.bernstein(para, x=X, n=n, nsim=100, lmr.dist=tr, bern.control=bc)
plot(qnorm(pp(X, a=0.40)), sort(X), type="n", log="y",
xlab="Standard normal variate", ylab="Quantile",
xlim=qnorm(range(F)), ylim=range(qlmomco(F,paraA)))
lines(qnorm(F), qlmomco(F, paraA), col=8, lwd=2)
lines(qnorm(F), qlmomco(F, para), lty=2)
points(qnorm(pp(X, a=0.40)), sort(X))
bc$p <- A$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=2)
bc$p <- B$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=3)
bc$p <- C$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=2, lty=2)
bc$p <- D$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=bc), col=3, lty=2)
# Kantorovich
A <- pfactor.bernstein(para, n=n, nsim=100, bern.control=kc)
B <- pfactor.bernstein(para, x=X, n=n, nsim=100, bern.control=kc)
C <- pfactor.bernstein(para, n=n, nsim=100, lmr.dist=tr, bern.control=kc)
D <- pfactor.bernstein(para, x=X, n=n, nsim=100, lmr.dist=tr, bern.control=kc)
plot(qnorm(pp(X, a=0.40)), sort(X), type="n", log="y",
xlab="Standard normal variate", ylab="Quantile",
xlim=qnorm(range(F)), ylim=range(qlmomco(F,paraA)))
lines(qnorm(F), qlmomco(F, paraA), col=8, lwd=2)
lines(qnorm(F), qlmomco(F, para), lty=2)
points(qnorm(pp(X, a=0.40)), sort(X))
kc$p <- A$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=2)
kc$p <- B$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=3)
kc$p <- C$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=2, lty=2)
kc$p <- D$pfactor
lines(qnorm(F), dat2bernqua(F,X, bern.control=kc), col=3, lty=2)
dev.off()
## End(Not run)
## Not run:
X <- exp(rnorm(200)); para <- parexp(lmoms(X))
"pfactor.root" <- function(para, p.lo, p.hi, ...) {
afunc <- function(p, para=NULL, x=NULL, ...) {
return(pfactor.bernstein(para=para, x=x, pfactors=p, ...)) }
rt <- uniroot(afunc, c(p.lo, p.hi),
tol=0.001, maxiter=30, nsim=500, para=para, ...)
return(rt)
}
pfactor.root(para, 0.05, 0.15, n=10, lmr.n="4")
pfactor.bernstein(para, n=10, lmr.n="4", nsim=200, p.lo=.05, p.hi=.15)
## End(Not run)