distance correlation {energy} | R Documentation |

Computes distance covariance and distance correlation statistics, which are multivariate measures of dependence.

```
dcov(x, y, index = 1.0)
dcor(x, y, index = 1.0)
```

`x` |
data or distances of first sample |

`y` |
data or distances of second sample |

`index` |
exponent on Euclidean distance, in (0,2] |

`dcov`

and `dcor`

compute distance
covariance and distance correlation statistics.

The sample sizes (number of rows) of the two samples must
agree, and samples must not contain missing values. Arguments
`x`

, `y`

can optionally be `dist`

objects;
otherwise these arguments are treated as data.

Distance correlation is a new measure of dependence between random
vectors introduced by Szekely, Rizzo, and Bakirov (2007).
For all distributions with finite first moments, distance
correlation `\mathcal R`

generalizes the idea of correlation in two
fundamental ways:
(1) `\mathcal R(X,Y)`

is defined for `X`

and `Y`

in arbitrary dimension.
(2) `\mathcal R(X,Y)=0`

characterizes independence of `X`

and
`Y`

.

Distance correlation satisfies `0 \le \mathcal R \le 1`

, and
`\mathcal R = 0`

only if `X`

and `Y`

are independent. Distance
covariance `\mathcal V`

provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients `\mathcal V`

and
`\mathcal R`

are given in (SRB 2007). The definitions of the
empirical coefficients are as follows.

The empirical distance covariance `\mathcal{V}_n(\mathbf{X,Y})`

with index 1 is
the nonnegative number defined by

```
\mathcal{V}^2_n (\mathbf{X,Y}) = \frac{1}{n^2} \sum_{k,\,l=1}^n
A_{kl}B_{kl}
```

where `A_{kl}`

and `B_{kl}`

are

```
A_{kl} = a_{kl}-\bar a_{k.}- \bar a_{.l} + \bar a_{..}
```

```
B_{kl} = b_{kl}-\bar b_{k.}- \bar b_{.l} + \bar b_{..}.
```

Here

```
a_{kl} = \|X_k - X_l\|_p, \quad b_{kl} = \|Y_k - Y_l\|_q, \quad
k,l=1,\dots,n,
```

and the subscript `.`

denotes that the mean is computed for the
index that it replaces. Similarly,
`\mathcal{V}_n(\mathbf{X})`

is the nonnegative number defined by

```
\mathcal{V}^2_n (\mathbf{X}) = \mathcal{V}^2_n (\mathbf{X,X}) =
\frac{1}{n^2} \sum_{k,\,l=1}^n
A_{kl}^2.
```

The empirical distance correlation `\mathcal{R}_n(\mathbf{X,Y})`

is
the square root of

```
\mathcal{R}^2_n(\mathbf{X,Y})=
\frac {\mathcal{V}^2_n(\mathbf{X,Y})}
{\sqrt{ \mathcal{V}^2_n (\mathbf{X}) \mathcal{V}^2_n(\mathbf{Y})}}.
```

See `dcov.test`

for a test of multivariate independence
based on the distance covariance statistic.

`dcov`

returns the sample distance covariance and
`dcor`

returns the sample distance correlation.

Two methods of computing the statistics are provided.
`dcov`

and `dcor`

provide R interfaces to the C
implementation, which is usually faster. `dcov`

and `dcor`

call an internal function `.dcov`

.

Note that it is inefficient to compute dCor by:

square root of
`dcov(x,y)/sqrt(dcov(x,x)*dcov(y,y))`

because the individual
calls to `dcov`

involve unnecessary repetition of calculations.

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
*Annals of Statistics*, Vol. 35 No. 6, pp. 2769-2794.

doi: 10.1214/009053607000000505

Szekely, G.J. and Rizzo, M.L. (2009),
Brownian Distance Covariance,
*Annals of Applied Statistics*,
Vol. 3, No. 4, 1236-1265.

doi: 10.1214/09-AOAS312

Szekely, G.J. and Rizzo, M.L. (2009),
Rejoinder: Brownian Distance Covariance,
*Annals of Applied Statistics*, Vol. 3, No. 4, 1303-1308.

`bcdcor`

`dcovU`

`pdcor`

`dcov.test`

`dcor.ttest`

`pdcor.test`

```
x <- iris[1:50, 1:4]
y <- iris[51:100, 1:4]
dcov(x, y)
dcov(dist(x), dist(y)) #same thing
## C implementation
dcov(x, y, 1.5)
dcor(x, y, 1.5)
```

[Package *energy* version 1.7-10 Index]