R: Goodness-of-fit Tests for Copulas

gofCopula {copula}

R Documentation

Goodness-of-fit Tests for Copulas

Description

The goodness-of-fit tests are based, by default, on the empirical process comparing the empirical copula with a parametric estimate of the copula derived under the null hypothesis, the default test statistic, "Sn", being the Cramer-von Mises functional S_n defined in Equation (2) of Genest, Remillard and Beaudoin (2009). In that case, approximate p-values for the test statistic can be obtained either using a parametric bootstrap (see references two and three) or by means of a faster multiplier approach (see references four and five).

Alternative test statistics can be used, in particular if a parametric bootstrap is employed.

The prinicipal function is gofCopula() which, depending on simulation either calls gofPB() or gofMB().

Usage

## Generic [and "rotCopula" method] ------ Main function ------
gofCopula(copula, x, ...)
## S4 method for signature 'copula'
gofCopula(copula, x, N = 1000,
          method = c("Sn", "SnB", "SnC", "Rn"),
          estim.method = c("mpl", "ml", "itau", "irho", "itau.mpl"),
          simulation = c("pb", "mult"), test.method = c("family", "single"),
          verbose = interactive(), ties = NA,
          ties.method = c("max", "average", "first", "last", "random", "min"),
          fit.ties.meth = eval(formals(rank)$ties.method), ...)

## (Deprecated) internal 'helper' functions : ---
gofPB(copula, x, N, method = c("Sn", "SnB", "SnC"),
      estim.method = c("mpl", "ml", "itau", "irho", "itau.mpl"),
      trafo.method = if(method == "Sn") "none" else c("cCopula", "htrafo"),
      trafoArgs = list(), test.method = c("family", "single"),
      verbose = interactive(), useR = FALSE, ties = NA,
      ties.method = c("max", "average", "first", "last", "random", "min"),
      fit.ties.meth = eval(formals(rank)$ties.method), ...)

gofMB(copula, x, N, method = c("Sn", "Rn"),
      estim.method = c("mpl", "ml", "itau", "irho"),
      test.method = c("family", "single"), verbose = interactive(),
      useR = FALSE, m = 1/2, zeta.m = 0, b = 1/sqrt(nrow(x)),
      ties.method = c("max", "average", "first", "last", "random", "min"),
      fit.ties.meth = eval(formals(rank)$ties.method), ...)

Arguments

`copula`	object of class `"copula"` representing the hypothesized copula family.
`x`	a data matrix that will be transformed to pseudo-observations using `pobs()`.
`N`	number of bootstrap or multiplier replications to be used to obtain approximate realizations of the test statistic under the null hypothesis.
`method`	a `character` string specifying the goodness-of-fit test statistic to be used. For `simulation = "pb"`, one of "Sn", "SnB" or "SnC" with `trafo.method != "none"` if `method != "Sn"`. For `simulation = "mult"`, one of `"Sn"` or `"Rn"`, where the latter is `R_n` from Genest et al. (2013).
`estim.method`	a `character` string specifying the estimation method to be used to estimate the dependence parameter(s); see `fitCopula()`.
`simulation`	a string specifying the resampling method for generating approximate realizations of the test statistic under the null hypothesis; can be either `"pb"` (parametric bootstrap) or `"mult"` (multiplier).
`test.method`	a `character` string specifying the test method to be used. Only in exceptional cases should this be different from the default `test.method = "family"`. If `test.method = "single"`, a test precisely for the provided copula (not its parametric family) is conducted. This makes sense only for specific applications such as testing random number generators.
`verbose`	a logical specifying if progress of the parametric bootstrap should be displayed via `txtProgressBar`.
`...`	for `gofCopula`, additional arguments passed to `gofPB()` or `gofMB()`; for `gofPB()` and `gofMB()`: additional arguments passed to `fitCopula()`. These may notably contain `hideWarnings`, and `optim.method`, `optim.control`, `lower`, or `upper` depending on the `optim.method`.
`trafo.method`	only for the parametric bootstrap (`"pb"`): String specifying the transformation to `U[0,1]^d`; either `"none"` or one of `"cCopula"`, see `cCopula()`, or `"htrafo"`, see `htrafo()`. If `method != "Sn"`, one needs to set `trafo.method != "none"`.
`trafoArgs`	only for the parametric bootstrap. A `list` of optional arguments passed to the transformation method (see `trafo.method` above).
`useR`	logical indicating whether an R or C implementation is used.
`ties.method`	string specifying how ranks should be computed, except for fitting, if there are ties in any of the coordinate samples of `x`; passed to `pobs`.
`fit.ties.meth`	string specifying how ranks should be computed when fitting by maximum pseudo-likelihood if there are ties in any of the coordinate samples of `x`; passed to `pobs`.
`ties`	only for the parametric bootstrap. Logical indicating whether a version of the parametric boostrap adapted to the presence of ties in any of the coordinate samples of `x` should be used; the default value of `NA` indicates that the presence/absence of ties will be checked for automatically.
`m`, `zeta.m`	only for the multiplier with `method = "Rn"`. `m` is the power and `zeta.m` is the adjustment parameter `\zeta_m` for the denominator of the test statistic.
`b`	only for the multiplier. `b` is the bandwidth required for the estimation of the first-order partial derivatives based on the empirical copula.

Details

If the parametric bootstrap is used, the dependence parameters of the hypothesized copula family can be estimated by any estimation method available for the family, up to a few exceptions. If the multiplier is used, any of the rank-based methods can be used in the bivariate case, but only maximum pseudo-likelihood estimation can be used in the multivariate (multiparameter) case.

The price to pay for the higher computational efficiency of the multiplier is more programming work as certain partial derivatives need to be computed for each hypothesized parametric copula family. When estimation is based on maximization of the pseudo-likelihood, these have been implemented for six copula families thus far: the Clayton, Gumbel-Hougaard, Frank, Plackett, normal and t copula families. For other families, numerical differentiation based on grad() from package numDeriv is used (and a warning message is displayed).

Although the empirical processes involved in the multiplier and the parametric bootstrap-based test are asymptotically equivalent under the null, the finite-sample behavior of the two tests might differ significantly.

Both for the parametric bootstrap and the multiplier, the approximate p-value is computed as

(0.5 +\sum_{b=1}^N\mathbf{1}_{\{T_b\ge T\}})/(N+1),

where T and T_b denote the test statistic and the bootstrapped test statistc, respectively. This ensures that the approximate p-value is a number strictly between 0 and 1, which is sometimes necessary for further treatments. See Pesarin (2001) for more details.

For the normal and t copulas, several dependence structures can be hypothesized: "ex" for exchangeable, "ar1" for AR(1), "toep" for Toeplitz, and "un" for unstructured (see ellipCopula()). For the t copula, "df.fixed" has to be set to TRUE, which implies that the degrees of freedom are not considered as a parameter to be estimated.

The former argument print.every is deprecated and not supported anymore; use verbose instead.

Value

An object of class htest which is a list, some of the components of which are

`statistic`	value of the test statistic.
`p.value`	corresponding approximate p-value.
`parameter`	estimates of the parameters for the hypothesized copula family.

Note

These tests were theoretically studied and implemented under the assumption of continuous margins, which implies that ties in the component samples occur with probability zero. The presence of ties in the data might substantially affect the approximate p-values. Through argument ties, the user can however select a version of the parametric bootstrap adapted to the presence of ties. No such adaption exists for the multiplier for the moment.

References

Genest, C., Huang, W., and Dufour, J.-M. (2013). A regularized goodness-of-fit test for copulas. Journal de la Société française de statistique 154, 64–77.

Genest, C. and Rémillard, B. (2008). Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models. Annales de l'Institut Henri Poincare: Probabilites et Statistiques 44, 1096–1127.

Genest, C., Rémillard, B., and Beaudoin, D. (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics 44, 199–214.

Kojadinovic, I., Yan, J., and Holmes M. (2011). Fast large-sample goodness-of-fit tests for copulas. Statistica Sinica 21, 841–871.

Kojadinovic, I. and Yan, J. (2011). A goodness-of-fit test for multivariate multiparameter copulas based on multiplier central limit theorems. Statistics and Computing 21, 17–30.

Kojadinovic, I. and Yan, J. (2010). Modeling Multivariate Distributions with Continuous Margins Using the copula R Package. Journal of Statistical Software 34(9), 1–20, https://www.jstatsoft.org/v34/i09/.

Kojadinovic, I. (2017). Some copula inference procedures adapted to the presence of ties. Computational Statistics and Data Analysis 112, 24–41, https://arxiv.org/abs/1609.05519.

Pesarin, F. (2001). Multivariate Permutation Tests: With Applications in Biostatistics. Wiley.

Examples

## The following example is available in batch through
## demo(gofCopula)

n <- 200; N <- 1000 # realistic (but too large for interactive use)
n <-  60; N <-  200 # (time (and tree !) saving ...)

## A two-dimensional data example ----------------------------------
set.seed(271)
x <- rCopula(n, claytonCopula(3))

## Does the Gumbel family seem to be a good choice (statistic "Sn")?
gofCopula(gumbelCopula(), x, N=N)
## With "SnC", really s..l..o..w.. --- with "SnB", *EVEN* slower
gofCopula(gumbelCopula(), x, N=N, method = "SnC", trafo.method = "cCopula")
## What about the Clayton family?
gofCopula(claytonCopula(), x, N=N)

## Similar with a different estimation method
gofCopula(gumbelCopula (), x, N=N, estim.method="itau")
gofCopula(claytonCopula(), x, N=N, estim.method="itau")


## A three-dimensional example  ------------------------------------
x <- rCopula(n, tCopula(c(0.5, 0.6, 0.7), dim = 3, dispstr = "un"))

## Does the Gumbel family seem to be a good choice?
g.copula <- gumbelCopula(dim = 3)
gofCopula(g.copula, x, N=N)
## What about the t copula?
t.copula <- tCopula(dim = 3, dispstr = "un", df.fixed = TRUE)
if(FALSE) ## this is *VERY* slow currently
  gofCopula(t.copula, x, N=N)

## The same with a different estimation method
gofCopula(g.copula, x, N=N, estim.method="itau")
if(FALSE) # still really slow
  gofCopula(t.copula, x, N=N, estim.method="itau")

## The same using the multiplier approach
gofCopula(g.copula, x, N=N, simulation="mult")
gofCopula(t.copula, x, N=N, simulation="mult")
if(FALSE) # no yet possible
    gofCopula(t.copula, x, N=N, simulation="mult", estim.method="itau")

[Package copula version 1.1-3 Index]