statTn {copBasic} | R Documentation |
The Tn Statistic of a Fitted Copula to an Empirical Copula
Description
Compute the statistic of Genest et al. (2011) that is defined as
where is the empirical copula,
is the fitted copula with estimated parameters
from the sample of size
. The
for
is reported by those authors to be of general purpose and overall performance in large scale simulation studies. The extension here for arbitary exponent
is made for flexibility. Alternatively the definition could be associated with the statistic
in terms of a root
of the summation as shown above.
The statistic is obviously a form of deviation between the empirical (nonparametric) and parametric fitted copula. The distribution of this statistic through Monte Carlo simulation could be used for inference. The inference is based on that a chosen parametric model is suitably close to the empirical copula. The
statistic has an advantage of being relatively straightforward to understand and explain to stakeholders and decision makers, is attractive for being suitable in a wide variety of circumstances, but intuitively might have limited statistical power in some situations for it looks at whole copula structure and not say at tail dependency. Finally, other goodness-of-fits using the squared differences between
and
are
aicCOP
, bicCOP
, and rmseCOP
.
Usage
statTn(u, v=NULL, cop=NULL, para=NULL, p=2, proot=FALSE, ...)
Arguments
u |
Nonexceedance probability |
v |
Nonexceedance probability |
cop |
A copula function; |
para |
Vector of parameters or other data structure, if needed, to pass to the copula; |
p |
The value for |
proot |
A logical controling whether the |
... |
Additional arguments to pass to the copula function and (or) the empirical copula. |
Value
The value for is returned dependent on the specification of
and whether rooting of the result is desired.
Note
The Examples section shows a simple computation of the statistic for a sample and a fitted copula to that sample. Ideally
statTn
would be wrapped in a Monte Carlo process of fitting the apparent “parent” distribution from the sample data, then for some large replication count, generate samples of size
from the parent and from these samples compute the empirical copula and also fit parameter(s) of the chosen copula and repeatedly solve for
. Given a total of
values of
, then the sample
or
can be compared to the distribution, and if
is greater than say the 95th percentile, then the assumed form of the copula could be rejected.
The distTn
defined below and is dependent on the copBasic.fitpara.beta
function can be used to demonstrate concepts. (The process is complex enough that user-level implementation of distTn
in copBasic is not presently (2019) thought appropriate.)
"distTn" <- function(n, N=1000, statf=NULL, cop=NULL, para=para, interval=NULL, ...) { opts <- options(warn=-1) message("Estimating Tn distribution: ", appendLF=FALSE) Tn <- vector(mode="numeric", N) for(i in 1:N) { showi <- as.logical(length(grep("0+$", i, perl=TRUE))) if(showi) message(i, "-", appendLF=FALSE) ruv <- simCOP(n=n, cop=cop, para=para, graphics=FALSE, ...) rpara <- copBasic.fitpara.beta(ruv, statf=statf, interval=interval, cop=cop) Tn[i] <- ifelse(is.na(rpara), NA, statTn(ruv, cop=cop, para=rpara)) } numNA <- length(Tn[is.na(Tn)]) message("done: Number of failed parameter estimates=", numNA) options(opts) return(Tn[! is.na(Tn)]) }
Let us imagine an sample size of a Galambos copula (
;
GLcop
) and then treat the Plackett copula (;
PLACKETTcop
) as the proper (chosen) model. The estimated parameter by the sample Blomqvist Beta of using the
blomCOP
function called from within copBasic.fitpara.beta
is then placed in variable para
. The is not the most efficient estimator but for purposes here, but it is fast. The parameter for the given seed is estimated as about
.
n <- 400 # sample size correctCopula <- GLcop; set.seed(1596) sampleUV <- simCOP(n=n, cop=correctCopula, para=1.9) # a random sample para.correctCopula <- copBasic.fitpara.beta(uv=sampleUV, statf=blomCOP, interval=c(1,5), cop=correctCopula) chosenCopula <- PLACKETTcop para <- copBasic.fitpara.beta(uv=sampleUV, statf=blomCOP, interval=c(.001,200), cop=chosenCopula )
Next, compute the sample from
sampleUV
. The distribution of the is estimated using the
distTn
function, and an estimate of the p-value is in turn estimated. A large simulation run
for a sample of size of
is selected. The
distTn
function internally will simulated for N
-replicates from the assumed parent and estimate the parameter. A computation run yields a p-value of approximately 0.01 (depending upon the seed) and is statistically significant at an alpha of 0.05, and therefore, the should be rejected for fitting to these data.
sampleTn <- statTn(sampleUV, cop=chosenCopula, para=para) Tns <- distTn(n=n, cop=chosenCopula, para=para, interval=c(0.001, 100), statf=blomCOP) Tns_pvalue <- 1 - sum(Tns <= sampleTn) / length(Tns) # estimate p-value
The demonstration is furthered with a check on the Kullback–Leibler sample size at the 5-percent significance level (alpha = 0.05) by the
kullCOP
function, which yields . Given the parent copula as
, therefore, it would take approximately 100 samples to distinguish between that copula and a
where in this case the fit was through the
.
kullCOP(cop1=correctCopula, para1=1.9, cop2=chosenCopula, para2=para)$KL.sample.size # KL sample size = 100 vuongCOP(sampleUV, cop1=correctCopula, para1=para.correctCopula, cop2=chosenCopula, para2=para)$message # [1] "Copula 1 has better fit than Copula 2 at 100x(1-alpha) level"
The available sample size is then about four times larger than
so the sample size
should be sufficient to judge goodness-of-fit. This is a large value but with the sample variability of
, it seems that other measures of association such as Spearman Rho (
rhoCOP
) or Kendall Tau (tauCOP
) and others cross-referenced therein might be preferable.
The prior conclusion is supported by the p-value of the being about 0.01, which suggests that the
is not a good model of the available sample data in
sampleUV
. Lastly, these judgments are consistent with the Vuoug Procedure performed by the vuongCOP
function, which reports at the 5-percent significance level that “copula number 1”—in this case, the —has the better fit, and this is obviously consistent with the problem setup because the random sample for investigation was drawn from the Galambos coupla (the parent form).
Author(s)
W.H. Asquith
References
Genest, C., Kojadinovic, I., Nešlehová, J., and Yan, J., 2011, A goodness-of-fit test for bivariate extreme-value copulas: Bernoulli, v. 17, no. 1, pp. 253–275.
See Also
aicCOP
, bicCOP
, rmseCOP
, vuongCOP
, kullCOP
Examples
## Not run:
# Example here is just for Tn. For the example below, the PSP copula is quite different
# from the Gumbel-Hougaard copula and thus, the hatTn would be expected to be different
# from those of the Gumbel-Hougaard and certainly not too near to zero.
samUV <- simCOP(n=60, cop=PSP, graphics=FALSE, seed=1) # random sample
hatTau <- cor(samUV$U, samUV$V, method="kendall") # Kendall Tau
hatTn <- statTn(samUV, cop=GHcop, para=GHcop(tau=hatTau)$para,
ctype="bernstein", bernprogress=TRUE) # 0.03328789
# hatTn in this case is by itself is somewhat uninformative and requires
# Monte Carlo to put an individual value into context.
## End(Not run)