vuongCOP {copBasic} | R Documentation |
The Vuong Procedure for Parametric Copula Comparison
Description
Perform the Vuong Procedure following Joe (2014, pp. 257–258). Consider two copula densities and
for two different bivariate copulas
and
having respective parameters
and
that provide the “closest” Kullback–Leibler Divergence from the true copula density
.
The difference of the Kullback–Leibler Divergence (kullCOP
) of the two copulas from the true copula density can be measured for a sample of size and bivariate sample realizations
by
where is referred to in the copBasic package as the “Vuong
” and
is defined as
The variance of can be estimated by
The sample estimate and variance are readily turned into the confidence interval by
where, using the quantile (inverse) function of the t-distribution
for nonexceedance probability
and
degrees of freedom for
being the sample size, the confidence interval is
Joe (2014, p. 258) reports other interval forms based (1) on the Akaike (AIC) correction and (2) on the Schwarz (BIC) correction, which are defined for AIC as
and for BIC as
The AIC and BIC corrections incorporate the number of parameters in the copula and for all else being equal the copula with the fewer parameters is preferable. If the two copulas being compared have equal number of parameters than the AIC and BIC equate to and the same confidence interval because the difference
is zero.
Joe (2014, p. 258) reports that these three intervals can be used for diagnostic inference as follows. If an interval contains 0 (zero), then copulas and
are not considered significantly different. If the interval does not contain 0, then copula
or
is the better fit depending on whether the interval is completely below 0 (thus
better fit) or above 0 (thus
better fit), respectively. Joe (2014) goes on the emphasize that “the procedure compares different [copulas] and assesses whether they provide similar fits to the data. [The procedure] does not assess whether [either copula] is a good enough fit.”
Usage
vuongCOP(u, v=NULL, cop1=NULL, cop2=NULL, para1=NULL, para2=NULL,
alpha=0.05, method=c("D12", "AIC", "BIC"),
the.zero=.Machine$double.eps^0.25, ...)
Arguments
u |
Nonexceedance probability |
v |
Nonexceedance probability |
cop1 |
A copula function corresponding to copula |
para1 |
Vector of parameters or other data structure, if needed, to pass to the copula |
cop2 |
A copula function corresponding to copula |
para2 |
Vector of parameters or other data structure, if needed, to pass to the copula |
alpha |
The |
method |
The interval to evaluate as to position of the respective statistic form the comparison of the two copulas; |
the.zero |
The value for “the zero” of the copula density function. This argument is the argument of the same name for |
... |
Additional arguments to pass to the |
Value
An R list
is returned having the following components:
title |
A descriptive title of the procedure; |
method |
A textual description of the |
result.text |
A textual description of the result of the Vuong Procedure; |
result |
A value 1 if |
p.value |
The two-sided p-values of the Vuong Procedure inclusive of |
D12 |
A named vector of the lower and upper bounds of Vuong |
AIC |
A named vector of the lower and upper bounds of Vuong |
BIC |
A named vector of the lower and upper bounds of Vuong |
parameters |
A named vector of the alpha, sample size, value for the t-distribution quantile |
Note
The vuongCOP
function along with kullCOP
and features of function densityCOPplot
represent collective milestones towards copula inference and diagnostics post fitting of copulas to the usual measures of association such as the Kendall Tau () and Spearman Rho (
) and their copula counterparts
(
tauCOP
) and (
rhoCOP
).
For an example application, imagine a problem of say low springflow risk at “nearby springs” that jointly should converge in the lower tail because drought usually has a strong regional impact. First, it is necessary to form a reflection of the Gumbel–Hougaard copula (;
GHcop
) but parameter estimation using is the same because sample
is invariant to reflection.
"rGHcop" <- function(u,v, ...) { u + v - 1 + GHcop(1-u, 1-v, ...) } set.seed(385) # setting so that reported quantities here are reproducible
The prior code also sets a seed on the pseudo-random number generator so that reported values here are reproducible. The reflected is denoted
.
Second, the copula (
PSP
) is chosen as the parent distribution, and this copula has no parameter. The has lower-tail dependency, which will be important as discussion unfolds. The following two lines of code establish a sample size to be drawn from the
and then simulates a sample from that copula. The color grey is used for the simulated values on the figure produced by
simCOP
, which forms a background example of the joint structure of the copula.
n <- 390 UV <- simCOP(cop=PSP, n=n, col=8, pch=16) # simulate and form the base image
By inspection of the so-produced graphic, it is obvious that there is contraction in the lower-left corner of the plot, which is a geometric representation of tail dependency. The lower-tail dependency thus phenomenalogically says that there is joint interconnect during low springflow conditions—both springs are likely to be at low flow simultaneously. The variable UV
contains the bivariate data as uniform variables (nonexceedance probabilities and
).
The Plackett copula (;
PLACKETTcop
) and the copula are chosen as candidate models of the “unknown” parent. Both
and
copulas use different “measures of association” for their parameter estimation. Next, sample estimates of the copula parameters using Schweizer and Wolff Sigma
. The sample value computations and parameter estimates also are set as shown in the following code:
Wolf <- wolfCOP(para=UV, as.sample=TRUE) # 0.496943 paraPL <- uniroot(function(p) Wolf - wolfCOP(cop=PLACKETTcop, para=p), c(1,30))$root paraGH <- uniroot(function(p) Wolf - wolfCOP(cop=rGHcop, para=p), c(1,30))$root
STEP 1—Compute Kullback–Leibler sample size: The Kullback–Leibler Divergences ( and
) are computed (
kullCOP
) for the evaluation of the sample size as appropriate for distinguishing between the two candidate copulas 95 percent of the time. The Kullback–Leibler sample size () also is computed as the following code illustrates and provides additional commentary.
KL <- replicate(20, kullCOP(cop1=PLcop, para1=paraPL, # CPU intensive cop2=rGHcop, para2=paraGH, n=1E5)$KL.sample.size) print(round(mean(KL))) # n_{fg} = 221 sample size print( range(KL)) # 204 <-- n_{fg} --> 252 sample size range
Depending on the sample coming from the simulation of the parent
copula, the call to
kullCOP
will likely report different values because
. These sample sizes have a range for 20 replications of about
. The result here is
and thus the sample size
should be more than large enough to generally distinguish between the
and
copulas at the respective sample measure of association.
STEP 2—Perform the Vuong Procedure: The Vuong Procedure can now be completed. Now watch the copula and parameter order in the next code for mistakes, the author has purposefully switched order here to draw attention to the need to make sure argument cop1
has the correct parameter(s) for copula 1 (the ). The two calls to
simCOP
are made to graphically superimpose these simulations on top of the parent .
VD <- vuongCOP(UV, cop2=rGHcop, para2=paraGH, cop1=PLcop, para1=paraPL) print(VD) # "Copula 2 better" or rGHcop (Gumbel-Hougaard is better) set.seed(385) # seems harmless enough to reuse the seed to emphasize "fit" TMP <-simCOP(cop=PLcop, para=paraPL,n=n,plot=FALSE,col="red", pch=16,cex=0.5) set.seed(385) # seems harmless enough to reuse the seed to emphasize "fit" TMP <-simCOP(cop=rGHcop,para=paraGH,n=n,plot=FALSE,col="green",pch=16,cex=0.5) rm(TMP) # just cleaning up the workspace.
Further discussion of the Vuong Procedure is informative. Simply speaking, the result is that the (copula 2) has better fit than
(copula 1). The 95-percent confident limits from the procedure for
with p-value
,
, and
are
. This interval does not contain zero and is greater than zero and therefore a conclusion may be drawn that copula 2 has the better fit.
STEP 3—Comparison of lower-tail dependency parameters: What does the tail dependency do for inference? This can be checked by computing the lower-tail dependency parameters (;
taildepCOP
) in the code that follows for each of the three copulas and the empirical copula with acknowledgment that true sample estimators do not quite exist. Numeric focus need only be on the lower tail, but the four graphics are informative.
taildepCOP(cop=PSP, plot=TRUE)$lambdaL # = 1/2 taildepCOP(cop=PLcop, para=paraPL, plot=TRUE)$lambdaL # = ZERO taildepCOP(cop=rGHcop, para=paraGH, plot=TRUE)$lambdaL # = 0.429 taildepCOP(cop=EMPIRcop, para=UV, plot=TRUE)$lambdaL # = 0.328
The important aspect of the graphics by taildepCOP
is that the has lower-tail dependency whereas the
does not. So, based on inspection
is superior given that we known
was the true parent. The empirical estimate of the
through the
EMPIRcop
copula indicates that its lower-tail dependency is closer to that of the relative to
and thus quantitatively by lower-tail dependency the
has a superior fit.
Therefore the has a tail dependency more similar to the true model compared to the
. Hence for this example, the
is clearly a superior fitting model in terms of the Vuong Procedure (fit alone) and the
then is used as a follow up to shown that the
might be “good enough” an approximation to the
. The efficacy of reflecting the
copula into a “new” form as
is demonstrated. Users are strongly encouraged to review the so-produced graphic from the
simCOP
call several listings back for , and lastly, this example is one for which absence of the argument
snv
(standard normal variate [scores]) by simCOP
makes the tail dependency issue for the sample size more prominent in the graphic.
STEP 4—Qualitatively compare using copula density plots: Graphical depiction of copula density contours by the densityCOPplot
function supports the conclusion that the is the superior model relative to the
. The so-produced graphic obviously shows that the
strongly mimics the shape of the parent
.
densityCOPplot(cop=PSP, contour.col=8) # grey is the parent bivariate density densityCOPplot(cop=PLcop, para=paraPL, contour.col="green", ploton=FALSE) densityCOPplot(cop=rGHcop, para=paraGH, contour.col="red", ploton=FALSE)
STEP 5—Compute L-comoments of the data via simulation and estimate the sampling distributions: An open research problem is the what if any role that L-comoments might play in either copula estimation or inference. (There being very little literature on the topic?) Because a measure of association was used for parameter estimation, the L-correlation is uniformative, but a comparison is conceptually useful. The and Spearman Rho of the data
and the L-correlations
are all similar as mandated by the mathematics.
Inference using L-coskew and L-cokurtosis seems possible. The following code listing is CPU intensive. First, the L-correlation, L-coskew, and L-cokurtosis values are computed from the simulated sample by the lcomoms2()
function of the lmomco package. Second and third, the respective sampling distributions of these L-comoments (lcomCOPpv
) for the two copulas are estimated.
UVlmr <- lmomco::lcomoms2(UV, nmom=4) # The sample L-comoments # This execution will result in nonrejection of rGH copula. GHlmr <- lcomCOPpv(n, UVlmr, cop=rGHcop, para=paraGH) # NONREJECTION # LcomType n Mean Lscale Lskew Lkurt sample.est p.value signif # Tau3[12] 390 -0.06952 0.01819 0.04505 0.12024 -0.11188 0.08795 . # Tau3[21] 390 -0.06739 0.02084 0.04104 0.12917 -0.10673 0.14162 - # Tau3[12:21] 390 -0.06845 0.01713 0.04930 0.11696 -0.10931 0.08161 . # Tau4[12] 390 0.04970 0.01682 -0.01635 0.10150 0.04183 0.38996 - # Tau4[21] 390 0.05129 0.01606 -0.06833 0.13798 0.07804 0.17470 - # Tau4[12:21] 390 0.05049 0.01329 -0.02045 0.12001 0.05994 0.35069 - # This execution will result in rejection of Plackett copula. PLlmr <- lcomCOPpv(n, UVlmr, cop=PLACKETTcop, para=paraPL) # REJECT PLACKETT # LcomType n Mean Lscale Lskew Lkurt sample.est p.value signif # Tau3[12] 390 -0.00267 0.02133 0.01556 0.09581 -0.11188 0.00129 ** # Tau3[21] 390 -0.00112 0.02022 -0.00663 0.13338 -0.10673 0.00189 ** # Tau3[12:21] 390 -0.00189 0.01757 0.00906 0.10226 -0.10931 0.00019 *** # Tau4[12] 390 0.00153 0.01652 -0.03320 0.12468 0.04183 0.07924 . # Tau4[21] 390 0.00361 0.01851 -0.01869 0.12052 0.07804 0.00929 ** # Tau4[12:21] 390 0.00257 0.01362 -0.01194 0.10796 0.05994 0.00744 **
Because each copula was fit to a measure of association, the p-values for the L-correlations are all nonsignificant (noninformative because of how the copulas were fit), and therefore p-values quite near to the 50th percentile should be produced. So here, the L-correlation is noninformative on fit even though it might have some role because it is asymmetrical unlike that statistics and
. The results in variable
GHlmr
show no statistically significant entries (all p-values ) for L-coskew and L-cokurtosis—the
copula is not rejected. The results in
PLlmr
show many p-values for both L-coskew and L-cokurtosis—the
copula is rejected. The experimental L-comoment inference shown is consistent with results with the Vuong Procedure.
The Vuong Procedure, however, does not address adequacy of fit—it just evaluates which copula fits better. The inspection of the lower tail dependency results previously shown ( = 0.429) along with the L-coskew and L-cokurtosis of the sample being well within the sample distribution suggests that the
is a adequate mimic of the
copula.
Some open research questions concern the numerical performance of the L-comoments as simulation sample size becomes large and whether or not the L-comoments should be computed on the probabilities . Also should conversion to normal scores be made and if so, should adjustment by the Hazen plotting positions (
for rank
) be made that Joe (2014) repeatedly advocates when standard normal variates (scores) [
for quantile function of standard normal distribution
]? Collectively, Nelsen (2006) and Salvadori et al. (2007) are both silent on the matter of normal score conversion, and in conclusion Nelsen (2006), Salvadori et al. (2007), and Joe (2014) also are all silent on L-comoment applications with copulas.
Author(s)
W.H. Asquith
References
Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.
Nelsen, R.B., 2006, An introduction to copulas: New York, Springer, 269 p.
Salvadori, G., De Michele, C., Kottegoda, N.T., and Rosso, R., 2007, Extremes in Nature—An approach using copulas: Springer, 289 p.
See Also
densityCOP
, kullCOP
, simCOP
, statTn
, mleCOP
Examples
# See extended code listings and discussion in the Note section
# See Examples in mleCOP() (Last example therein might suggest a problem in the
# implied 95th percentile associated with n_fg above.