R: The Vuong Procedure for Parametric Copula Comparison

vuongCOP {copBasic}

R Documentation

The Vuong Procedure for Parametric Copula Comparison

Description

Perform the Vuong Procedure following Joe (2014, pp. 257–258). Consider two copula densities f_1 = c_1(u,v; \Theta_1) and f_2 = c_2(u,v; \Theta_2) for two different bivariate copulas \mathbf{C}_1(\Theta_1) and \mathbf{C}_2(\Theta_2) having respective parameters \Theta_1 and \Theta_2 that provide the “closest” Kullback–Leibler Divergence from the true copula density g(u,v).

The difference of the Kullback–Leibler Divergence (kullCOP) of the two copulas from the true copula density can be measured for a sample of size n and bivariate sample realizations \{u_i, v_i\} by

\hat{D}_{12} = n^{-1}\sum_{i=1}^n D_i\mbox{,}

where \hat{D}_{12} is referred to in the copBasic package as the “Vuong D” and D_i is defined as

D_i = \log\biggl[\frac{f_1(u_i, v_i; \Theta_2)}{f_2(u_i, v_i; \Theta_1)}\biggr]\mbox{.}

The variance of \hat{D}_{12} can be estimated by

\hat\sigma^2_{12} = (n-1)^{-1}\sum_{i=1}^n (D_i - \hat{D}_{12})^2\mbox{.}

The sample estimate and variance are readily turned into the 100{{\times}}(1 - \alpha) confidence interval by

\hat{D}^{(\mathrm{lo})}_{12} < \hat{D}_{12} < \hat{D}^{(\mathrm{hi})}_{12}\mbox{,}

where, using the quantile (inverse) function of the t-distribution \sim \mathcal{T}^{(-1)}(F; \mathrm{df}{=}(n-2)) for nonexceedance probability F and n-2 degrees of freedom for n being the sample size, the confidence interval is

\hat{D}_{12}-\mathcal{T}^{(-1)}(1-\alpha/2){\times}\hat\sigma_{12}/\sqrt{n} < \hat{D}_{12} < \hat{D}_{12}+\mathcal{T}^{(-1)}(1-\alpha/2){\times}\hat\sigma_{12}/\sqrt{n}\mbox{.}

Joe (2014, p. 258) reports other interval forms based (1) on the Akaike (AIC) correction and (2) on the Schwarz (BIC) correction, which are defined for AIC as

\mathrm{AIC} = \hat{D}_{12} - (2n)^{-1}\log(n)\biggl[\mathrm{dim}(\Theta_2) - \mathrm{dim}(\Theta_1)\biggr]\pm \mathcal{T}^{(-1)}(1-\alpha/2){\times}\hat\sigma_{12}/\sqrt{n}\mbox{,}

and for BIC as

\mathrm{BIC} = \hat{D}_{12} - (2n)^{-1}\log(n)\biggl[\mathrm{dim}(\Theta_2) - \mathrm{dim}(\Theta_1)\biggr]\pm \mathcal{T}^{(-1)}(1-\alpha/2){\times}\hat\sigma_{12}/\sqrt{n}\mbox{.}

The AIC and BIC corrections incorporate the number of parameters in the copula and for all else being equal the copula with the fewer parameters is preferable. If the two copulas being compared have equal number of parameters than the AIC and BIC equate to \hat{D}_{12} and the same confidence interval because the difference [\mathrm{dim}(\Theta_2) - \mathrm{dim}(\Theta_1)] is zero.

Joe (2014, p. 258) reports that these three intervals can be used for diagnostic inference as follows. If an interval contains 0 (zero), then copulas \mathbf{C}_1(\Theta_1) and \mathbf{C}_2(\Theta_2) are not considered significantly different. If the interval does not contain 0, then copula \mathbf{C}_1(\Theta_1) or \mathbf{C}_2(\Theta_2) is the better fit depending on whether the interval is completely below 0 (thus \mathbf{C}_1(\Theta_1) better fit) or above 0 (thus \mathbf{C}_2(\Theta_2) better fit), respectively. Joe (2014) goes on the emphasize that “the procedure compares different [copulas] and assesses whether they provide similar fits to the data. [The procedure] does not assess whether [either copula] is a good enough fit.”

Usage

vuongCOP(u, v=NULL, cop1=NULL, cop2=NULL, para1=NULL, para2=NULL,
                    alpha=0.05, method=c("D12", "AIC", "BIC"),
                    the.zero=.Machine$double.eps^0.25, ...)

Arguments

`u`	Nonexceedance probability `u` in the `X` direction;
`v`	Nonexceedance probability `v` in the `Y` direction and if `NULL` then `u` is treated as a two column R `data.frame`;
`cop1`	A copula function corresponding to copula `f_1` in the Vuong Procedure;
`para1`	Vector of parameters or other data structure, if needed, to pass to the copula `f_1`;
`cop2`	A copula function corresponding to copula `f_2` in the the Vuong Procedure;
`para2`	Vector of parameters or other data structure, if needed, to pass to the copula `f_2`;
`alpha`	The `\alpha` in the Vuong Procedure, which results in the `100{\times}(1 - \alpha)` confidence interval (two sided);
`method`	The interval to evaluate as to position of the respective statistic form the comparison of the two copulas;
`the.zero`	The value for “the zero” of the copula density function. This argument is the argument of the same name for `densityCOP`. The default here is intended to suggest that a tiny nonzero value for density will trap the numerical zero densities; and
`...`	Additional arguments to pass to the `densityCOP` function.

Value

An R list is returned having the following components:

`title`	A descriptive title of the procedure;
`method`	A textual description of the `method` setting;
`result.text`	A textual description of the result of the Vuong Procedure;
`result`	A value 1 if `\mathbf{C}_1(\Theta_1)` is better fit, 2 if copula `\mathbf{C}_2(\Theta_2)` is better fit, and `0` if neither is better (`\hat{D}_{12} = 0`), and `NA` including the likely(?) erroneous situation of `\mathbf{C}_1(\Theta_1) \equiv \mathbf{C}_2(\Theta_2)`;
`p.value`	The two-sided p-values of the Vuong Procedure inclusive of `\mathrm{AIC}` and `\mathrm{BIC}`;
`D12`	A named vector of the lower and upper bounds of Vuong `D` at the respective confidence interval percentage along with `\hat{D}_{12}` and `\sigma^2_{12}`;
`AIC`	A named vector of the lower and upper bounds of Vuong `\mathrm{AIC}` at the respective confidence interval percentage;
`BIC`	A named vector of the lower and upper bounds of Vuong `\mathrm{BIC}` at the respective confidence interval percentage; and
`parameters`	A named vector of the alpha, sample size, value for the t-distribution quantile `qt(1-alpha/2, df=n)`, and `\hat\sigma_{12}`.

Note

The vuongCOP function along with kullCOP and features of function densityCOPplot represent collective milestones towards copula inference and diagnostics post fitting of copulas to the usual measures of association such as the Kendall Tau (\tau_K) and Spearman Rho (\rho_S) and their copula counterparts \tau_\mathbf{C} (tauCOP) and \rho_\mathbf{C} (rhoCOP).

For an example application, imagine a problem of say low springflow risk at “nearby springs” that jointly should converge in the lower tail because drought usually has a strong regional impact. First, it is necessary to form a reflection of the Gumbel–Hougaard copula (\mathbf{GH}(u,v; \Theta_{\mathbf{GH}}); GHcop) but parameter estimation using \tau_\mathbf{C} is the same because sample \hat\tau_K is invariant to reflection.

  "rGHcop" <- function(u,v, ...) { u + v - 1 + GHcop(1-u, 1-v, ...) }
  set.seed(385) # setting so that reported quantities here are reproducible

The prior code also sets a seed on the pseudo-random number generator so that reported values here are reproducible. The reflected \mathbf{GH}(u,v; \Theta_{\mathbf{GH}}) is denoted \mathbf{rGH}(u,v; \Theta_{\mathbf{rGH}}).

Second, the \mathbf{PSP}(u,v) copula (PSP) is chosen as the parent distribution, and this copula has no parameter. The \mathbf{PSP} has lower-tail dependency, which will be important as discussion unfolds. The following two lines of code establish a sample size to be drawn from the \mathbf{PSP} and then simulates a sample from that copula. The color grey is used for the simulated values on the figure produced by simCOP, which forms a background example of the joint structure of the \mathbf{PSP} copula.

  n <- 390
  UV <- simCOP(cop=PSP, n=n, col=8, pch=16) # simulate and form the base image

By inspection of the so-produced graphic, it is obvious that there is contraction in the lower-left corner of the plot, which is a geometric representation of tail dependency. The lower-tail dependency thus phenomenalogically says that there is joint interconnect during low springflow conditions—both springs are likely to be at low flow simultaneously. The variable UV contains the bivariate data as uniform variables (nonexceedance probabilities u and v).

The Plackett copula (\mathbf{PL}(u,v; \Theta_{\mathbf{PL}}); PLACKETTcop) and the \mathbf{rGH}(u,v; \Theta_{\mathbf{rGH}}) copula are chosen as candidate models of the “unknown” parent. Both \mathbf{PL} and \mathbf{rGH} copulas use different “measures of association” for their parameter estimation. Next, sample estimates of the copula parameters using Schweizer and Wolff Sigma \hat\sigma_\mathbf{C}. The sample value computations and parameter estimates also are set as shown in the following code:

  Wolf   <- wolfCOP(para=UV, as.sample=TRUE) # 0.496943
  paraPL <- uniroot(function(p)
                Wolf - wolfCOP(cop=PLACKETTcop, para=p), c(1,30))$root
  paraGH <- uniroot(function(p)
                Wolf - wolfCOP(cop=rGHcop,      para=p), c(1,30))$root

STEP 1—Compute Kullback–Leibler sample size: The Kullback–Leibler Divergences (\mathrm{KL}(f {\mid} g) and \mathrm{KL}(g {\mid} f)) are computed (kullCOP) for the evaluation of the sample size as appropriate for distinguishing between the two candidate copulas 95 percent of the time. The Kullback–Leibler sample size (n_{f\!g}) also is computed as the following code illustrates and provides additional commentary.

  KL <- replicate(20, kullCOP(cop1=PLcop,  para1=paraPL,       # CPU intensive
                              cop2=rGHcop, para2=paraGH, n=1E5)$KL.sample.size)
  print(round(mean(KL))) #         n_{fg} = 221   sample size
  print(     range(KL))  # 204 <-- n_{fg} --> 252 sample size range

Depending on the sample \hat\sigma_\mathbf{C} coming from the simulation of the parent \mathbf{PSP} copula, the call to kullCOP will likely report different n_{f\!g} values because n_{f\!g}(\mathbf{C}_1(\Theta_1), \mathbf{C}_1(\Theta_1). These sample sizes have a range for 20 replications of about n_{f\!g}=204{-}252. The result here is n_{f\!g}=221 and thus the sample size n=390 should be more than large enough to generally distinguish between the \mathbf{PL} and \mathbf{rGH} copulas at the respective sample measure of association.

STEP 2—Perform the Vuong Procedure: The Vuong Procedure can now be completed. Now watch the copula and parameter order in the next code for mistakes, the author has purposefully switched order here to draw attention to the need to make sure argument cop1 has the correct parameter(s) for copula 1 (the \mathbf{PL}). The two calls to simCOP are made to graphically superimpose these simulations on top of the parent \mathbf{PSP}.

  VD <- vuongCOP(UV, cop2=rGHcop, para2=paraGH, cop1=PLcop, para1=paraPL)
  print(VD) # "Copula 2 better" or rGHcop (Gumbel-Hougaard is better)
  set.seed(385) # seems harmless enough to reuse the seed to emphasize "fit"
  TMP <-simCOP(cop=PLcop, para=paraPL,n=n,plot=FALSE,col="red",  pch=16,cex=0.5)
  set.seed(385) # seems harmless enough to reuse the seed to emphasize "fit"
  TMP <-simCOP(cop=rGHcop,para=paraGH,n=n,plot=FALSE,col="green",pch=16,cex=0.5)
  rm(TMP) # just cleaning up the workspace.

Further discussion of the Vuong Procedure is informative. Simply speaking, the result is that the \mathbf{rGH} (copula 2) has better fit than \mathbf{PL} (copula 1). The 95-percent confident limits from the procedure for \hat{D}_{12} = 0.049 with p-value 0.0012, \hat\sigma_{12} = 0.297, and n=390 are 0.0194 < \hat{D}_{12} < 0.0786. This interval does not contain zero and is greater than zero and therefore a conclusion may be drawn that copula 2 has the better fit.

STEP 3—Comparison of lower-tail dependency parameters: What does the tail dependency do for inference? This can be checked by computing the lower-tail dependency parameters (\lambda^L_\mathbf{C}; taildepCOP) in the code that follows for each of the three copulas and the empirical copula with acknowledgment that true sample estimators do not quite exist. Numeric focus need only be on the lower tail, but the four graphics are informative.

  taildepCOP(cop=PSP,                   plot=TRUE)$lambdaL # = 1/2
  taildepCOP(cop=PLcop,    para=paraPL, plot=TRUE)$lambdaL # = ZERO
  taildepCOP(cop=rGHcop,   para=paraGH, plot=TRUE)$lambdaL # = 0.429
  taildepCOP(cop=EMPIRcop, para=UV,     plot=TRUE)$lambdaL # = 0.328

The important aspect of the graphics by taildepCOP is that the \mathbf{rGH} has lower-tail dependency whereas the \mathbf{PL} does not. So, based on inspection \mathbf{rGH} is superior given that we known \mathbf{PSP} was the true parent. The empirical estimate of the \hat\lambda^L_\mathbf{C} = 0.328 through the EMPIRcop copula indicates that its lower-tail dependency is closer to that of the \mathbf{rGH} relative to \mathbf{PL} and thus quantitatively by lower-tail dependency the \mathbf{rGH} has a superior fit.

Therefore the \mathbf{rGH} has a tail dependency more similar to the true model compared to the \mathbf{PL}. Hence for this example, the \mathbf{rGH} is clearly a superior fitting model in terms of the Vuong Procedure (fit alone) and the \lambda^L_\mathbf{C} then is used as a follow up to shown that the \mathbf{rGH} might be “good enough” an approximation to the \mathbf{PSP}. The efficacy of reflecting the \mathbf{GH} copula into a “new” form as \mathbf{rGH} is demonstrated. Users are strongly encouraged to review the so-produced graphic from the simCOP call several listings back for n=390, and lastly, this example is one for which absence of the argument snv (standard normal variate [scores]) by simCOP makes the tail dependency issue for the sample size more prominent in the graphic.

STEP 4—Qualitatively compare using copula density plots: Graphical depiction of copula density contours by the densityCOPplot function supports the conclusion that the \mathbf{rGH} is the superior model relative to the \mathbf{PL}. The so-produced graphic obviously shows that the \mathbf{rGH} strongly mimics the shape of the parent \mathbf{PSP}.

  densityCOPplot(cop=PSP, contour.col=8) # grey is the parent bivariate density
  densityCOPplot(cop=PLcop,  para=paraPL, contour.col="green", ploton=FALSE)
  densityCOPplot(cop=rGHcop, para=paraGH, contour.col="red",   ploton=FALSE)

STEP 5—Compute L-comoments of the data via simulation and estimate the sampling distributions: An open research problem is the what if any role that L-comoments might play in either copula estimation or inference. (There being very little literature on the topic?) Because a measure of association was used for parameter estimation, the L-correlation is uniformative, but a comparison is conceptually useful. The \hat\sigma_\mathbf{C} = 0.4969 and Spearman Rho of the data \hat\rho_S and the L-correlations \hat\rho_S \approx \tau^{[12]}_{2} \approx \tau^{[21]}_{2} \approx 0.497 are all similar as mandated by the mathematics.

Inference using L-coskew and L-cokurtosis seems possible. The following code listing is CPU intensive. First, the L-correlation, L-coskew, and L-cokurtosis values are computed from the simulated sample by the lcomoms2() function of the lmomco package. Second and third, the respective sampling distributions of these L-comoments (lcomCOPpv) for the two copulas are estimated.

  UVlmr <- lmomco::lcomoms2(UV, nmom=4) # The sample L-comoments
  # This execution will result in nonrejection of rGH copula.
  GHlmr <- lcomCOPpv(n, UVlmr, cop=rGHcop,      para=paraGH) # NONREJECTION
  # LcomType      n     Mean  Lscale    Lskew   Lkurt sample.est p.value signif
  #    Tau3[12] 390 -0.06952 0.01819  0.04505 0.12024   -0.11188 0.08795      .
  #    Tau3[21] 390 -0.06739 0.02084  0.04104 0.12917   -0.10673 0.14162      -
  # Tau3[12:21] 390 -0.06845 0.01713  0.04930 0.11696   -0.10931 0.08161      .
  #    Tau4[12] 390  0.04970 0.01682 -0.01635 0.10150    0.04183 0.38996      -
  #    Tau4[21] 390  0.05129 0.01606 -0.06833 0.13798    0.07804 0.17470      -
  # Tau4[12:21] 390  0.05049 0.01329 -0.02045 0.12001    0.05994 0.35069      -

  # This execution will result in rejection of Plackett copula.
  PLlmr <- lcomCOPpv(n, UVlmr, cop=PLACKETTcop, para=paraPL) # REJECT PLACKETT
  #  LcomType     n     Mean  Lscale    Lskew   Lkurt sample.est p.value signif
  #    Tau3[12] 390 -0.00267 0.02133  0.01556 0.09581   -0.11188 0.00129     **
  #    Tau3[21] 390 -0.00112 0.02022 -0.00663 0.13338   -0.10673 0.00189     **
  # Tau3[12:21] 390 -0.00189 0.01757  0.00906 0.10226   -0.10931 0.00019    ***
  #    Tau4[12] 390  0.00153 0.01652 -0.03320 0.12468    0.04183 0.07924      .
  #    Tau4[21] 390  0.00361 0.01851 -0.01869 0.12052    0.07804 0.00929     **
  # Tau4[12:21] 390  0.00257 0.01362 -0.01194 0.10796    0.05994 0.00744     **

Because each copula was fit to a measure of association, the p-values for the L-correlations are all nonsignificant (noninformative because of how the copulas were fit), and therefore p-values quite near to the 50th percentile should be produced. So here, the L-correlation is noninformative on fit even though it might have some role because it is asymmetrical unlike that statistics \tau_K and \rho_S. The results in variable GHlmr show no statistically significant entries (all p-values {>}0.05 = (\alpha=0.1)/2)) for L-coskew and L-cokurtosis—the \mathbf{rGH} copula is not rejected. The results in PLlmr show many p-values {<}0.05 = (\alpha=0.1)/2 for both L-coskew and L-cokurtosis—the \mathbf{PL} copula is rejected. The experimental L-comoment inference shown is consistent with results with the Vuong Procedure.

The Vuong Procedure, however, does not address adequacy of fit—it just evaluates which copula fits better. The inspection of the lower tail dependency results previously shown (\lambda^L_\mathbf{PSP} = 1/2 \approx \lambda^U_\mathbf{rGH} = 0.429) along with the L-coskew and L-cokurtosis of the sample being well within the sample distribution suggests that the \mathbf{rGH} is a adequate mimic of the \mathbf{PSP} copula.

Some open research questions concern the numerical performance of the L-comoments as simulation sample size becomes large and whether or not the L-comoments should be computed on the probabilities \{u, v\}. Also should conversion to normal scores be made and if so, should adjustment by the Hazen plotting positions (u_i = (r_i - 0.5)/n for rank r_i) be made that Joe (2014) repeatedly advocates when standard normal variates (scores) [z_i = \Phi^{(-1)}(u_i) for quantile function of standard normal distribution \Phi(0,1)]? Collectively, Nelsen (2006) and Salvadori et al. (2007) are both silent on the matter of normal score conversion, and in conclusion Nelsen (2006), Salvadori et al. (2007), and Joe (2014) also are all silent on L-comoment applications with copulas.

Author(s)

W.H. Asquith

References

Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.

Nelsen, R.B., 2006, An introduction to copulas: New York, Springer, 269 p.

Salvadori, G., De Michele, C., Kottegoda, N.T., and Rosso, R., 2007, Extremes in Nature—An approach using copulas: Springer, 289 p.

Examples

# See extended code listings and discussion in the Note section
# See Examples in mleCOP() (Last example therein might suggest a problem in the
# implied 95th percentile associated with n_fg above.

[Package copBasic version 2.2.4 Index]