statTn {copBasic}R Documentation

The Tn Statistic of a Fitted Copula to an Empirical Copula

Description

Compute the Tn(p)T_n(p) statistic of Genest et al. (2011) that is defined as

Tn(p)=i=1nCn(ui,vi)CΘn(ui,vi)p\mbox,T_n(p) = \sum_{i=1}^n \big|\mathbf{C}_n(u_i, v_i) - \mathbf{C}_{\Theta_n}(u_i, v_i)\big|^p\mbox{,}

where Cn(u,v)\mathbf{C}_n(u,v) is the empirical copula, CΘn(u,v)\mathbf{C}_{\Theta_n}(u,v) is the fitted copula with estimated parameters Θn\Theta_n from the sample of size nn. The TnT_n for p=2p = 2 is reported by those authors to be of general purpose and overall performance in large scale simulation studies. The extension here for arbitary exponent pp is made for flexibility. Alternatively the definition could be associated with the statistic Tn(p)1/pT_n(p)^{1/p} in terms of a root 1/p1/p of the summation as shown above.

The TnT_n statistic is obviously a form of deviation between the empirical (nonparametric) and parametric fitted copula. The distribution of this statistic through Monte Carlo simulation could be used for inference. The inference is based on that a chosen parametric model is suitably close to the empirical copula. The Tn(p)T_n(p) statistic has an advantage of being relatively straightforward to understand and explain to stakeholders and decision makers, is attractive for being suitable in a wide variety of circumstances, but intuitively might have limited statistical power in some situations for it looks at whole copula structure and not say at tail dependency. Finally, other goodness-of-fits using the squared differences between Cn(u,v)\mathbf{C}_n(u,v) and CΘm(u,v)\mathbf{C}_{\Theta_m}(u, v) are aicCOP, bicCOP, and rmseCOP.

Usage

statTn(u, v=NULL, cop=NULL, para=NULL, p=2, proot=FALSE, ...)

Arguments

u

Nonexceedance probability uu in the XX direction;

v

Nonexceedance probability vv in the YY direction. If not given, then a second column from argument u is attempted;

cop

A copula function;

para

Vector of parameters or other data structure, if needed, to pass to the copula;

p

The value for pp, and the default follows that of Genest et al. (2011);

proot

A logical controling whether the TnT_n returned be rooted by 1/p1/p, and the default follows that of Genest et al. (2011); and

...

Additional arguments to pass to the copula function and (or) the empirical copula.

Value

The value for TnT_n is returned dependent on the specification of pp and whether rooting of the result is desired.

Note

The Examples section shows a simple computation of the T^n\hat T_n statistic for a sample and a fitted copula to that sample. Ideally statTn would be wrapped in a Monte Carlo process of fitting the apparent “parent” distribution from the sample data, then for some large replication count, generate NN samples of size nn from the parent and from these samples compute the empirical copula and also fit parameter(s) of the chosen copula and repeatedly solve for TnT_n. Given a total of NN values of TnT_n, then the sample TnT_n or T^n\hat{T}_n can be compared to the distribution, and if T^n\hat{T}_n is greater than say the 95th percentile, then the assumed form of the copula could be rejected.

The distTn defined below and is dependent on the copBasic.fitpara.beta function can be used to demonstrate concepts. (The process is complex enough that user-level implementation of distTn in copBasic is not presently (2019) thought appropriate.)

  "distTn" <- function(n, N=1000, statf=NULL,
                       cop=NULL, para=para, interval=NULL, ...) {
      opts <- options(warn=-1)
      message("Estimating Tn distribution: ", appendLF=FALSE)
      Tn <- vector(mode="numeric", N)
      for(i in 1:N) {
         showi <- as.logical(length(grep("0+$", i, perl=TRUE)))
         if(showi) message(i, "-", appendLF=FALSE)
         ruv <- simCOP(n=n, cop=cop, para=para, graphics=FALSE, ...)
         rpara <- copBasic.fitpara.beta(ruv, statf=statf,
                             interval=interval, cop=cop)
         Tn[i] <- ifelse(is.na(rpara), NA, statTn(ruv, cop=cop, para=rpara))
      }
      numNA <- length(Tn[is.na(Tn)])
      message("done: Number of failed parameter estimates=", numNA)
      options(opts)
      return(Tn[! is.na(Tn)])
  }

Let us imagine an n=400n=400 sample size of a Galambos copula (GL(u,v)\mathbf{GL}(u,v); GLcop) and then treat the Plackett copula (PL(u,v)\mathbf{PL}(u,v); PLACKETTcop) as the proper (chosen) model. The estimated parameter by the sample Blomqvist Beta of β^C=0.64\hat\beta_\mathbf{C} = 0.64 using the blomCOP function called from within copBasic.fitpara.beta is then placed in variable para. The β^C\hat\beta_\mathbf{C} is not the most efficient estimator but for purposes here, but it is fast. The parameter for the given seed is estimated as about PL(Θ^=20.75)\mathbf{PL}(\hat\Theta{=}20.75).

  n <- 400 # sample size
  correctCopula <- GLcop; set.seed(1596)
  sampleUV <- simCOP(n=n, cop=correctCopula, para=1.9) # a random sample
  para.correctCopula <- copBasic.fitpara.beta(uv=sampleUV, statf=blomCOP,
                                interval=c(1,5),      cop=correctCopula)
  chosenCopula <- PLACKETTcop
  para <- copBasic.fitpara.beta(uv=sampleUV, statf=blomCOP,
                                interval=c(.001,200), cop=chosenCopula )

Next, compute the sample T^n=0.063\hat T_n = 0.063 from sampleUV. The distribution of the TnT_n is estimated using the distTn function, and an estimate of the T^n\hat T_n p-value is in turn estimated. A large simulation run N=1,000N = 1{,}000 for a sample of size of n=400n = 400 is selected. The distTn function internally will simulated for N-replicates from the assumed parent and estimate the parameter. A computation run yields a p-value of approximately 0.01 (depending upon the seed) and is statistically significant at an alpha of 0.05, and therefore, the PL(Θ=20.75)\mathbf{PL}(\Theta{=}20.75) should be rejected for fitting to these data.

  sampleTn   <- statTn(sampleUV, cop=chosenCopula, para=para)
  Tns        <- distTn(n=n,      cop=chosenCopula, para=para,
                       interval=c(0.001, 100), statf=blomCOP)
  Tns_pvalue <- 1 - sum(Tns <= sampleTn) / length(Tns) # estimate p-value

The demonstration is furthered with a check on the Kullback–Leibler sample size nf ⁣gn_{f\!g} at the 5-percent significance level (alpha = 0.05) by the kullCOP function, which yields 100100. Given the parent copula as GL(Θ=1.9)\mathbf{GL}(\Theta{=}1.9), therefore, it would take approximately 100 samples to distinguish between that copula and a PL(Θ=20.75)\mathbf{PL}(\Theta{=}20.75) where in this case the fit was through the β^C=0.64\hat\beta_\mathbf{C} = 0.64.

  kullCOP(cop1=correctCopula, para1=1.9,
          cop2=chosenCopula,  para2=para)$KL.sample.size # KL sample size = 100
  vuongCOP(sampleUV, cop1=correctCopula, para1=para.correctCopula,
                     cop2=chosenCopula,  para2=para)$message
  # [1] "Copula 1 has better fit than Copula 2 at 100x(1-alpha) level"

The available sample size n=400n = 400 is then about four times larger than nf ⁣gn_{f\!g} so the sample size nn should be sufficient to judge goodness-of-fit. This is a large value but with the sample variability of β^C\hat\beta_\mathbf{C}, it seems that other measures of association such as Spearman Rho (rhoCOP) or Kendall Tau (tauCOP) and others cross-referenced therein might be preferable.

The prior conclusion is supported by the p-value of the T^n\hat T_n being about 0.01, which suggests that the PL(u,v)\mathbf{PL}(u,v) is not a good model of the available sample data in sampleUV. Lastly, these judgments are consistent with the Vuoug Procedure performed by the vuongCOP function, which reports at the 5-percent significance level that “copula number 1”—in this case, the GL(u,v)\mathbf{GL}(u,v)—has the better fit, and this is obviously consistent with the problem setup because the random sample for investigation was drawn from the Galambos coupla (the parent form).

Author(s)

W.H. Asquith

References

Genest, C., Kojadinovic, I., Nešlehová, J., and Yan, J., 2011, A goodness-of-fit test for bivariate extreme-value copulas: Bernoulli, v. 17, no. 1, pp. 253–275.

See Also

aicCOP, bicCOP, rmseCOP, vuongCOP, kullCOP

Examples

## Not run: 
# Example here is just for Tn. For the example below, the PSP copula is quite different
# from the Gumbel-Hougaard copula and thus, the hatTn would be expected to be different
# from those of the Gumbel-Hougaard and certainly not too near to zero.
samUV  <- simCOP(n=60, cop=PSP, graphics=FALSE, seed=1)   # random sample
hatTau <- cor(samUV$U, samUV$V, method="kendall")         # Kendall Tau
hatTn  <- statTn(samUV, cop=GHcop, para=GHcop(tau=hatTau)$para,
                 ctype="bernstein", bernprogress=TRUE)    # 0.03328789
# hatTn in this case is by itself is somewhat uninformative and requires
# Monte Carlo to put an individual value into context.
## End(Not run)

[Package copBasic version 2.2.4 Index]