R: Gof test using the Anderson-Darling test statistic and the...

gofRosenblattGamma {gofCopula}

R Documentation

Gof test using the Anderson-Darling test statistic and the gamma distribution

Description

gofRosenblattGamma contains the RosenblattGamma gof tests for copulae, described in Genest (2009) and Hofert (2014), and compares the empirical copula against a parametric estimate of the copula derived under the null hypothesis. The margins can be estimated by a bunch of distributions and the time which is necessary for the estimation can be given. The approximate p-values are computed with a parametric bootstrap, which computation can be accelerated by enabling in-build parallel computation. The gof statistics are computed with the function gofTstat from the package copula. It is possible to insert datasets of all dimensions above 1 and the possible copulae are "normal", "t", "clayton", "gumbel", "frank", "joe", "amh", "galambos", "fgm" and "plackett". The parameter estimation is performed with pseudo maximum likelihood method. In case the estimation fails, inversion of Kendall's tau is used.

Usage

gofRosenblattGamma(
  copula = c("normal", "t", "clayton", "gumbel", "frank", "joe", "amh", "galambos",
    "fgm", "plackett"),
  x,
  param = 0.5,
  param.est = TRUE,
  df = 4,
  df.est = TRUE,
  margins = "ranks",
  flip = 0,
  M = 1000,
  dispstr = "ex",
  lower = NULL,
  upper = NULL,
  seed.active = NULL,
  processes = 1
)

Arguments

`copula`	The copula to test for. Possible are `"normal"`, `"t"`, `"clayton"`, `"gumbel"`, `"frank"`, `"joe"`, `"amh"`, `"galambos"`, `"fgm"` and `"plackett"`.
`x`	A matrix containing the data with rows being observations and columns being variables.
`param`	The copula parameter to use, if it shall not be estimated.
`param.est`	Shall be either `TRUE` or `FALSE`. `TRUE` means that `param` will be estimated.
`df`	Degrees of freedom, if not meant to be estimated. Only necessary if tested for `"t"`-copula.
`df.est`	Indicates if `df` shall be estimated. Has to be either `FALSE` or `TRUE`, where `TRUE` means that it will be estimated.
`margins`	Specifies which estimation method for the margins shall be used. The default is `"ranks"`, which is the standard approach to convert data in such a case. Alternatively the following distributions can be specified: `"beta"`, `"cauchy"`, Chi-squared (`"chisq"`), `"f"`, `"gamma"`, Log normal (`"lnorm"`), Normal (`"norm"`), `"t"`, `"weibull"`, Exponential (`"exp"`). Input can be either one method, e.g. `"ranks"`, which will be used for estimation of all data sequences. Also an individual method for each margin can be specified, e.g. `c("ranks", "norm", "t")` for 3 data sequences. If one does not want to estimate the margins, set it to `NULL`.
`flip`	The control parameter to flip the copula by 90, 180, 270 degrees clockwise. Only applicable for bivariate copula. Default is 0 and possible inputs are 0, 90, 180, 270 and NULL.
`M`	Number of bootstrapping loops.
`dispstr`	A character string specifying the type of the symmetric positive definite matrix characterizing the elliptical copula. Implemented structures are "ex" for exchangeable and "un" for unstructured, see package `copula`.
`lower`	Lower bound for the maximum likelihood estimation of the copula parameter. The constraint is also active in the bootstrapping procedure. The constraint is not active when a switch to inversion of Kendall's tau is necessary. Default `NULL`.
`upper`	Upper bound for the maximum likelihood estimation of the copula parameter. The constraint is also active in the bootstrapping procedure. The constraint is not active when a switch to inversion of Kendall's tau is necessary. Default `NULL`.
`seed.active`	Has to be either an integer or a vector of M+1 integers. If an integer, then the seeds for the bootstrapping procedure will be simulated. If M+1 seeds are provided, then these seeds are used in the bootstrapping procedure. Defaults to `NULL`, then `R` generates the seeds from the computer runtime. Controlling the seeds is useful for reproducibility of a simulation study to compare the power of the tests or for reproducibility of an empirical study.
`processes`	The number of parallel processes which are performed to speed up the bootstrapping. Shouldn't be higher than the number of logical processors. Please see the details.

Details

This Anderson-Darling test statistic (supposedly) computes U[0,1]-distributed (under H_0) random variates via the distribution function of the gamma distribution, see Hofert et al. (2014). As written in Hofert et al. (2014) computes this Anderson-Darling test statistic for (supposedly) U[0,1]-distributed (under H_0) random variates via the distribution function of the gamma distribution. The H_0 hypothesis is

C \in \mathcal{C}_0

with \mathcal{C}_0 as the true class of copulae under H_0.

This test is based on the Rosenblatt probability integral transform which uses the mapping \mathcal{R}: (0,1)^d \rightarrow (0,1)^d. Following Genest et al. (2009) ensures this transformation the decomposition of a random vector \mathbf{u} \in [0,1]^d with a distribution into mutually independent elements with a uniform distribution on the unit interval. The mapping provides pseudo observations E_i, given by

E_1 = \mathcal{R}(U_1), \dots, E_n = \mathcal{R}(U_n).

The mapping is performed by assigning to every vector \mathbf{u} for e_1 = u_1 and for i \in \{2, \dots, d\},

e_i = \frac{\partial^{i-1} C(u_1, \dots, u_i, 1, \dots, 1)}{\partial u_1 \cdots \partial u_{i-1}} / \frac{\partial^{i-1} C(u_1, \dots, u_{i-1}, 1, \dots, 1)}{\partial u_1 \cdots \partial u_{i-1}}.

The Anderson-Darling test statistic of the variates

G(x_j) = \Gamma_d \left( x_j \right)

is computed (via ADGofTest::ad.test), where x_j = \sum_{i=1}^d (- \ln e_{ij}), \Gamma_d() denotes the distribution function of the gamma distribution with shape parameter d and shape parameter one (being equal to an Erlang(d) distribution function).

The test statistic is then given by

T = -n - \sum_{j=1}^n \frac{2j - 1}{n} [\ln(G(x_j)) + \ln(1 - G(x_{n+1-j}))].

The approximate p-value is computed by the formula,

\sum_{b=1}^M \mathbf{I}(|T_b| \geq |T|) / M,

where T and T_b denote the test statistic and the bootstrapped test statistc, respectively.

For small values of M, initializing the parallelisation via processes does not make sense. The registration of the parallel processes increases the computation time. Please consider to enable parallelisation just for high values of M.

Value

An object of the class gofCOP with the components

`method`	a character which informs about the performed analysis
`copula`	the copula tested for
`margins`	the method used to estimate the margin distribution.
`param.margins`	the parameters of the estimated margin distributions. Only applicable if the margins were not specified as `"ranks"` or `NULL`.
`theta`	dependence parameters of the copulae
`df`	the degrees of freedem of the copula. Only applicable for t-copula.
`res.tests`	a matrix with the p-values and test statistics of the hybrid and the individual tests

References

Christian Genest, Bruno Remillard, David Beaudoin (2009). Goodness-of-fit tests for copulas: A review and a power study. Insurance: Mathematics and Economics, Volume 44, Issue 2, April 2009, Pages 199-213, ISSN 0167-6687. doi: 10.1016/j.insmatheco.2007.10.005

Marius Hofert, Ivan Kojadinovic, Martin Maechler, Jun Yan (2014). copula: Multivariate Dependence with Copulas. R package version 0.999-15.. https://cran.r-project.org/package=copula

Examples


data(IndexReturns2D)

gofRosenblattGamma("normal", IndexReturns2D, M = 10)

[Package gofCopula version 0.4-1 Index]