dcov2d {energy} R Documentation

## Fast dCor and dCov for bivariate data only

### Description

For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.

### Usage

dcor2d(x, y, type = c("V", "U"))
dcov2d(x, y, type = c("V", "U"), all.stats = FALSE)


### Arguments

 x numeric vector y numeric vector type "V" or "U", for V- or U-statistics all.stats logical

### Details

The unbiased (squared) dcov is documented in dcovU, for multivariate data in arbitrary, not necessarily equal dimensions. dcov2d and dcor2d provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.

### Value

By default, dcov2d returns the V-statistic V_n = dCov_n^2(x, y), and if type="U", it returns the U-statistic, unbiased for dCov^2(X, Y). The argument all.stats=TRUE is used internally when the function is called from dcor2d.

By default, dcor2d returns dCor_n^2(x, y), and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to bcdcor.

These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.

### Note

The U-statistic U_n can be negative in the lower tail so the square root of the U-statistic is not applied. Similarly, dcor2d(x, y, "U") is bias-corrected and can be negative in the lower tail, so we do not take the square root. The original definitions of dCov and dCor (SRB2007, SR2009) were based on V-statistics, which are non-negative, and defined using the square root of V-statistics.

It has been suggested that instead of taking the square root of the U-statistic, one could take the root of |U_n| before applying the sign, but that introduces more bias than the original dCor, and should never be used.

### Author(s)

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

### References

Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.

Szekely, G.J. and Rizzo, M.L. (2014), Partial Distance Correlation with Methods for Dissimilarities. Annals of Statistics, Vol. 42 No. 6, 2382-2412.

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), Measuring and Testing Dependence by Correlation of Distances, Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
doi: 10.1214/009053607000000505

dcov dcov.test dcor dcor.test (multivariate statistics and permutation test)

### Examples


## these are equivalent, but 2d is faster for n > 50
n <- 100
x <- rnorm(100)
y <- rnorm(100)
all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE)
all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE)

x <- rlnorm(400)
y <- rexp(400)
dcov.test(x, y, R=199)    #permutation test
dcor.test(x, y, R=199)



[Package energy version 1.7-10 Index]