spectralmeas {copBasic} | R Documentation |
Estimation of the Spectral Measure
Description
Kiriliouk et al. (2016, pp. 360–364) describe estimation of the spectral measure of bivariate data. Standardize the bivariate data as and
as in
psepolar
and select a “large” value for the pseudo-polar radius for nonexceedance probability
. Estimate the spectral measure
, which is the limiting distribution of the pseudo-polar angle component
given that the corresponding radial component
is large:
So, is the cumulative distribution function of the spectral measure for angle
. The
can be specified by a nonexceedance probability
for
.
The estimation proceeds as follows:
Step 1: Convert the bivariate data into
by
psepolar
and set the threshold according to “
” (this part involving
does not make sense in Kiriliouk et al. (2016)) where for present implementation in copBasic the
given the
by the user is based on the empirical distribution of the
. The empirical distribution is estimated by the Bernstein empirical distribution function from the lmomco package.
Step 2: Let denote the set of indices that correspond to the observations when
and compute
as the cardinality of
, which simply means the length of the vector
.
Step 3: Use the maximum Euclidean likelihood estimator, which is the third of three methods mentioned by Kiriliouk et al. (2016):
where is an indicator function that is only triggered if
smooth=FALSE
, and following the notation of Kiriliouk et al. (2016), the “3” represents maximum Euclidean likelihood estimation. The are are the weights
where is the sample mean and
is the sample variance of
where Kiriliouk et al. (2016, p. 363) do not show the in the denominator for the variance but copBasic uses it because those authors state that the sample variance is used.
Step 4: A smoothed version of is optionally available by
where is the cumulative distribution function of the Beta distribution for
and where
is a smoothing parameter that can be optimized by cross validation.
Step 5: The spectral density lastly can be computed optionally as
where is the probability density function (pdf) of the Beta distribution. Readers are alerted to the absence of the
indicator function in the definitions of
and
. This is correct and matches Kiriliouk et al. (2016, eqs. 17.21 and 17.22) though this author was confused for a day or so by the indicator function in what is purported to be the core definition of
where
in Kiriliouk et al. (2016, eq. 17.21 and 17.17).
Usage
spectralmeas(u, v=NULL, w=NULL, f=0.90, snv=FALSE,
smooth=FALSE, nu=100, pdf=FALSE, ...)
Arguments
u |
Nonexceedance probability |
v |
Nonexceedance probability |
w |
A vector of polar angle values |
f |
The nonexceedance probability |
snv |
Return the standard normal variate of the |
smooth |
A logical to return |
nu |
The |
pdf |
A logical to return the smoothed probability density |
... |
Additional arguments to pass to the |
Value
An R vector
of ,
, or
is returned.
Note
The purpose of this section is to describe a CPU-intensive study of goodness-of-fit between a Gumbel–Hougaard copula (GHcop
, ) parent and a fitted Hüsler–Reiss copula (
HRcop
, ). Both of these copulas are extreme values and are somewhat similar to each other, so sample sizes necessary for detection of differences should be large. A two-sided Kolmogorov–Smirnov tests (KS test,
ks.test()
) is used to measure significance in the differences between the estimated spectral measure distributions at (the 90th percentile,
) into the right tail.
The true copula will be the having parameter
. The number of simulations per sample size
n
seq(50,1000, by=25)
is nsim = 500
. For each sample size, a sample from the true parent is drawn, and a fit by maximum likelihood (
mleCOP
). The two spectral measure distributions (,
Htru
and ,
Hfit
) are estimated for a uniform variate of the angle W
having length equal to the applicable sample size. The Kolmogorov–Smirnov (KS) test is made between Htru
and Hfit
, and number of p-values less than the (Type II error because alternative hypothesis is rigged as true) and simulation count are returned and written in file
Results.txt
. The sample sizes initially are small and traps of NaN
(abandonment of a simulation run) are made. These traps are needed when the empirical distribution of Htru
or Hfit
degenerates.
Results <- NULL true.par <- 3.3; true.cop <- GHcop; fit.cop <- HRcop; search <- c(0,100) nsim <- 20000; first_time <- TRUE; f <- 0.90; beta <- 0.05 ns <- c(seq(100,1000, by=50), 1250, 1500, 1750, 2000) for(n in ns) { W <- sort(runif(n)); PV <- vector(mode="numeric") for(i in 1:(nsim/(n/2))) { UV <- simCOP(n=n, cop=true.cop, para=true.par, graphics=FALSE) fit.par <- mleCOP(UV, cop= fit.cop, interval=search)$para UVfit <- simCOP(n=n, cop= fit.cop, para=fit.par, graphics=FALSE) Htru <- spectralmeas(UV, w=W, bound.type="Carv", f=f) Hfit <- spectralmeas(UVfit, w=W, bound.type="Carv", f=f) if(length(Htru[! is.nan(Htru)]) != length(Hfit[! is.nan(Hfit)]) | length(Htru[! is.nan(Htru)]) == 0 | length(Hfit[! is.nan(Hfit)]) == 0) { PV[i] <- NA; next } # suppressWarnings() to silence ties warnings from ks.test() KS <- suppressWarnings( stats::ks.test(Htru, Hfit)$p.value ) #plot(FF, H, type="l"); lines(FF, Hfit, col=2); mtext(KS) message("-",i, appendLF=FALSE) PV[i] <- KS } message(":",n) zz <- data.frame(SampleSize=n, NumPVle0p05=sum(PV[! is.na(PV)] <= beta), SimulationCount=length(PV[! is.na(PV)])) if(first_time) { Results <- zz; first_time <- FALSE; next } Results <- rbind(Results, zz) } plot(Results$SampleSize, 100*Results$NumPVle0p05/Results$SimulationCount, type="b", cex=1.1, xlab="Sample size", ylab="Percent simulations with p-value < 0.05")
The Results
show a general increase in the counts of p-value 0.05 as sample size increases. There is variation of course and increasing
nsim
would smooth that considerably. The results show for that the detection of statistically significant differences for extremal
dependency between the
and
are detected at the error rate implied by the specified
.
This range in sample size can be compared to the Kullback–Leibler sample size ():
UV <- simCOP(n=10000, cop=true.cop, para=true.par, graphics=FALSE) fit.par <- mleCOP(UV, cop= fit.cop, interval=search)$para kullCOP(cop1=true.cop, para1=true.par, cop2=fit.cop, para2=fit.par)$KL.sample.size # The Kullback-Leibler (integer) sample size for detection of differences at # alpha=0.05 are n_fg = (742, 809, 815, 826, 915, 973, 1203) for seven runs # Do more to see variation.
where the Kullback–Leilber approach is to measure density departures across the whole domain as opposed to extremal dependency in the right tail as does the spectral measure.
Different runs of the above code will result in different in part because of simulation differences internal to
kullCOP
but also because the has its own slight variation in its fit to the large sample simulation (
) of the parent. However, it seems that
will be on the order of the
for which the KS test on the spectral measure determines statistical significance with similar error rate.
Now if the aforementioned simulation run is repeated for or
f=0.95
, the obviously remains unchanged at about
but the
for which the error rate is about
is
. This sample size is clearly smaller than before and smaller than
, therefore, the analysis of the empirical spectral measure deeper into the tail
requires a smaller sample size to distinguish between the two copula. Though the analysis does not address the question as to whether one or both copula are adequate for the problem at hand. For a final comparison, if the aforementioned simulation run is repeated for
or
f=0.80
, then the for which the error rate is about
is
. Thus as analysis is made further away from the tail into the center of the distribution, the sample size to distinguish between these two similar copula increases substantially.
Author(s)
William Asquith william.asquith@ttu.edu
References
Kiriliouk, Anna, Segers, Johan, Warchoł, Michał, 2016, Nonparameteric estimation of extremal dependence: in Extreme Value Modeling and Risk Analysis, D.K. Dey and Jun Yan eds., Boca Raton, FL, CRC Press, ISBN 978–1–4987–0129–7.
See Also
Examples
## Not run:
UV <- simCOP(n=500, cop=HRcop, para=1.3, graphics=FALSE)
W <- seq(0,1,by=0.005)
Hu <- spectralmeas(UV, w=W)
Hs <- spectralmeas(UV, w=W, smooth=TRUE, nu=100)
plot(W,Hu, type="l", ylab="Spectral Measure H", xlab="Angle")
lines(W, Hs, col=2) #
## End(Not run)
## Not run:
"GAUScop" <- function(u,v, para=NULL, ...) {
if(length(u)==1) u<-rep(u,length(v)); if(length(v)==1) v<-rep(v,length(u))
return(copula::pCopula(matrix(c(u,v), ncol=2), para))
}
GAUSparfn <- function(rho) return(copula::normalCopula(rho, dim = 2))
n <- 2000 # The PSP parent has no upper tail dependency
uv <- simCOP(n=n, cop=PSP, para=NULL, graphics=FALSE)
PLpar <- mleCOP(uv, cop=PLcop, interval=c(0,100))$para
PLuv <- simCOP(n=n, cop=PLcop, para=PLpar, graphics=FALSE)
GApar <- mleCOP(uv, cop=GAUScop, parafn=GAUSparfn, interval=c(-1,1))$para
GAuv <- simCOP(n=n, cop=GAUScop, para=GApar, graphics=FALSE)
GLpar <- mleCOP(uv, cop=GLcop, interval=c(0,100))$para
GLuv <- simCOP(n=n, cop=GLcop, para=GLpar, graphics=FALSE)
FF <- c(0.001,seq(0.005,0.995, by=0.005),0.999); qFF <- qnorm(FF)
f <- 0.90 # Seeking beyond the 90th percentile pseudo-polar radius
PSPh <- spectralmeas( uv, w=FF, f=f, smooth=TRUE, snv=TRUE)
PLh <- spectralmeas(PLuv, w=FF, f=f, smooth=TRUE, snv=TRUE)
GAh <- spectralmeas(GAuv, w=FF, f=f, smooth=TRUE, snv=TRUE)
GLh <- spectralmeas(GLuv, w=FF, f=f, smooth=TRUE, snv=TRUE)
plot(qFF, PSPh, type="l", lwd=2, xlim=c(-3,3), ylim=c(-2,2),
xlab="STANDARD NORMAL VARIATE OF PSEUDO-POLAR ANGLE",
ylab="STANDARD NORMAL VARIATE OF SPECTRAL MEASURE PROBABILITY")
lines(qFF, PLh, col=2) # red line is the Plackett copula
lines(qFF, GAh, col=3) # green line is the Gaussian copula
lines(qFF, GLh, col=4) # blue line is the Galambos copula
# Notice the flat spot and less steep nature of the PSP (black line), which is
# indicative of no to even spreading tail dependency. The Plackett and Gaussian
# copulas show no specific steepening near the middle, which remains indicative
# of no tail dependency with the Plackett being less steep because it has a more
# dispersed copula density at the right tail is approached than the Gaussian.
# The Galambos copula has upper tail dependency, which is seen by
# the mass concentration and steepening of the curve on the plot.
## End(Not run)