distance correlation {energy} | R Documentation |
Distance Correlation and Covariance Statistics
Description
Computes distance covariance and distance correlation statistics, which are multivariate measures of dependence.
Usage
dcov(x, y, index = 1.0)
dcor(x, y, index = 1.0)
Arguments
x |
data or distances of first sample |
y |
data or distances of second sample |
index |
exponent on Euclidean distance, in (0,2] |
Details
dcov
and dcor
compute distance
covariance and distance correlation statistics.
The sample sizes (number of rows) of the two samples must agree, and samples must not contain missing values.
The index
is an optional exponent on Euclidean distance.
Valid exponents for energy are in (0, 2) excluding 2.
Argument types supported are numeric data matrix, data.frame, or tibble, with observations in rows; numeric vector; ordered or unordered factors. In case of unordered factors a 0-1 distance matrix is computed.
Optionally pre-computed distances can be input as class "dist" objects or as distance matrices. For data types of arguments, distance matrices are computed internally.
Distance correlation is a new measure of dependence between random
vectors introduced by Szekely, Rizzo, and Bakirov (2007).
For all distributions with finite first moments, distance
correlation generalizes the idea of correlation in two
fundamental ways:
(1)
is defined for
and
in arbitrary dimension.
(2)
characterizes independence of
and
.
Distance correlation satisfies , and
only if
and
are independent. Distance
covariance
provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients
and
are given in (SRB 2007). The definitions of the
empirical coefficients are as follows.
The empirical distance covariance
with index 1 is
the nonnegative number defined by
where and
are
Here
and the subscript .
denotes that the mean is computed for the
index that it replaces. Similarly,
is the nonnegative number defined by
The empirical distance correlation is
the square root of
See dcov.test
for a test of multivariate independence
based on the distance covariance statistic.
Value
dcov
returns the sample distance covariance and
dcor
returns the sample distance correlation.
Note
Note that it is inefficient to compute dCor by:
square root of
dcov(x,y)/sqrt(dcov(x,x)*dcov(y,y))
because the individual
calls to dcov
involve unnecessary repetition of calculations.
Author(s)
Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely
References
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
doi:10.1214/009053607000000505
Szekely, G.J. and Rizzo, M.L. (2009),
Brownian Distance Covariance,
Annals of Applied Statistics,
Vol. 3, No. 4, 1236-1265.
doi:10.1214/09-AOAS312
Szekely, G.J. and Rizzo, M.L. (2009), Rejoinder: Brownian Distance Covariance, Annals of Applied Statistics, Vol. 3, No. 4, 1303-1308.
See Also
dcov2d
dcor2d
bcdcor
dcovU
pdcor
dcov.test
dcor.test
pdcor.test
Examples
x <- iris[1:50, 1:4]
y <- iris[51:100, 1:4]
dcov(x, y)
dcov(dist(x), dist(y)) #same thing