dcov2d {energy} | R Documentation |

For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.

```
dcor2d(x, y, type = c("V", "U"))
dcov2d(x, y, type = c("V", "U"), all.stats = FALSE)
```

`x` |
numeric vector |

`y` |
numeric vector |

`type` |
"V" or "U", for V- or U-statistics |

`all.stats` |
logical |

The unbiased (squared) dcov is documented in `dcovU`

, for multivariate data in arbitrary, not necessarily equal dimensions. `dcov2d`

and `dcor2d`

provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.

By default, `dcov2d`

returns the V-statistic `V_n = dCov_n^2(x, y)`

, and if type="U", it returns the U-statistic, unbiased for `dCov^2(X, Y)`

. The argument all.stats=TRUE is used internally when the function is called from `dcor2d`

.

By default, `dcor2d`

returns `dCor_n^2(x, y)`

, and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to `bcdcor`

.

These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.

The U-statistic `U_n`

can be negative in the lower tail so
the square root of the U-statistic is not applied.
Similarly, `dcor2d(x, y, "U")`

is bias-corrected and can be
negative in the lower tail, so we do not take the
square root. The original definitions of dCov and dCor
(SRB2007, SR2009) were based on V-statistics, which are non-negative,
and defined using the square root of V-statistics.

It has been suggested that instead of taking the square root of the U-statistic, one could take the root of `|U_n|`

before applying the sign, but that introduces more bias than the original dCor, and should never be used.

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.

Szekely, G.J. and Rizzo, M.L. (2014),
Partial Distance Correlation with Methods for Dissimilarities.
*Annals of Statistics*, Vol. 42 No. 6, 2382-2412.

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
*Annals of Statistics*, Vol. 35 No. 6, pp. 2769-2794.

doi: 10.1214/009053607000000505

`dcov`

`dcov.test`

`dcor`

`dcor.test`

(multivariate statistics and permutation test)

```
## these are equivalent, but 2d is faster for n > 50
n <- 100
x <- rnorm(100)
y <- rnorm(100)
all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE)
all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE)
x <- rlnorm(400)
y <- rexp(400)
dcov.test(x, y, R=199) #permutation test
dcor.test(x, y, R=199)
```

[Package *energy* version 1.7-10 Index]