R: Test Independence of Continuous Random Variables via...

indepTest {copula}

R Documentation

Test Independence of Continuous Random Variables via Empirical Copula

Description

Multivariate independence test based on the empirical copula process as proposed by Christian Genest and Bruno Rémillard. The test can be seen as composed of three steps: (i) a simulation step, which consists of simulating the distribution of the test statistics under independence for the sample size under consideration; (ii) the test itself, which consists of computing the approximate p-values of the test statistics with respect to the empirical distributions obtained in step (i); and (iii) the display of a graphic, called a dependogram, enabling to understand the type of departure from independence, if any. More details can be found in the articles cited in the reference section.

Usage

indepTestSim(n, p, m = p, N = 1000, verbose = interactive())
indepTest(x, d, alpha=0.05)
dependogram(test, pvalues = FALSE, print = FALSE)

Arguments

`n`	sample size when simulating the distribution of the test statistics under independence.
`p`	dimension of the data when simulating the distribution of the test statistics under independence.
`m`	maximum cardinality of the subsets of variables for which a test statistic is to be computed. It makes sense to consider `m \ll p` especially when `p` is large.
`N`	number of repetitions when simulating under independence.
`verbose`	a logical specifying if progress should be displayed via `txtProgressBar`.
`x`	data frame or data matrix containing realizations (one per line) of the random vector whose independence is to be tested.
`d`	object of class `"indepTestDist"` as returned by the function `indepTestSim()`. It can be regarded as the empirical distribution of the test statistics under independence.
`alpha`	significance level used in the computation of the critical values for the test statistics.
`test`	object of class `"indepTest"` as returned by `indepTest()`.
`pvalues`	logical indicating whether the dependogram should be drew from test statistics or the corresponding p-values.
`print`	logical indicating whether details should be printed.

Details

The current (C code) implementation of indepTestSim() uses (RAM) memory of size O(n^2 p), and time O(N n^2 p). This renders it unfeasible when n is large.

See the references below for more details, especially Genest and Rémillard (2004).

The former argument print.every is deprecated and not supported anymore; use verbose instead.

Value

The function indepTestSim() returns an object of class "indepTestDist" whose attributes are: sample.size, data.dimension, max.card.subsets, number.repetitons, subsets (list of the subsets for which test statistics have been computed), subsets.binary (subsets in binary 'integer' notation), dist.statistics.independence (a N line matrix containing the values of the test statistics for each subset and each repetition) and dist.global.statistic.independence (a vector a length N containing the values of the global Cramér-von Mises test statistic for each repetition – see Genest et al (2007), p.175).

The function indepTest() returns an object of class "indepTest" whose attributes are: subsets, statistics, critical.values, pvalues, fisher.pvalue (a p-value resulting from a combination à la Fisher of the subset statistic p-values), tippett.pvalue (a p-value resulting from a combination à la Tippett of the subset statistic p-values), alpha (global significance level of the test), beta (1 - beta is the significance level per statistic), global.statistic (value of the global Cramér-von Mises statistic derived directly from the independence empirical copula process - see Genest et al (2007), p.175) and global.statistic.pvalue (corresponding p-value).

References

Deheuvels, P. (1979). La fonction de dépendance empirique et ses propriétés: un test non paramétrique d'indépendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274–292.

Deheuvels, P. (1981) A non parametric test for independence, Publ. Inst. Statist. Univ. Paris. 26, 29–50.

Genest, C. and Rémillard, B. (2004) Tests of independence and randomness based on the empirical copula process. Test 13, 335–369.

Genest, C., Quessy, J.-F., and Rémillard, B. (2006). Local efficiency of a Cramer-von Mises test of independence, Journal of Multivariate Analysis 97, 274–294.

Genest, C., Quessy, J.-F., and Rémillard, B. (2007) Asymptotic local efficiency of Cramér-von Mises tests for multivariate independence. The Annals of Statistics 35, 166–191.

Examples

## Consider the following example taken from
## Genest and Remillard (2004), p 352:

set.seed(2004)
x <- matrix(rnorm(500),100,5)
x[,1] <- abs(x[,1]) * sign(x[,2] * x[,3])
x[,5] <- x[,4]/2 + sqrt(3) * x[,5]/2

## In order to test for independence "within" x, the first step consists
## in simulating the distribution of the test statistics under
## independence for the same sample size and dimension,
## i.e. n=100 and p=5. As we are going to consider all the subsets of
## {1,...,5} whose cardinality is between 2 and 5, we set p=m=5.

## For a realistic N = 1000 (default), this takes a few seconds:
N. <- if(copula:::doExtras()) 1000 else 120
N.
system.time(d <- indepTestSim(100, 5, N = N.))
## For N=1000,  2 seconds (lynne 2015)
## You could save 'd' for future use, via  saveRDS()

## The next step consists of performing the test itself (and print its results):
(iTst <- indepTest(x,d))

## Display the dependogram with the details:
dependogram(iTst, print=TRUE)

## We could have tested for a weaker form of independence, for instance,
## by only computing statistics for subsets whose cardinality is between 2
## and 3. Consider for instance the following data:
y <- matrix(runif(500),100,5)
## and perform the test:
system.time( d <- indepTestSim(100,5,3, N=N.) )
iTy <- indepTest(y,d)
iTy
dependogram(iTy, print=TRUE)

[Package copula version 1.1-3 Index]