dcov2d {energy} | R Documentation |
Fast dCor and dCov for bivariate data only
Description
For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.
Usage
dcor2d(x, y, type = c("V", "U"))
dcov2d(x, y, type = c("V", "U"), all.stats = FALSE)
Arguments
x |
numeric vector |
y |
numeric vector |
type |
"V" or "U", for V- or U-statistics |
all.stats |
logical |
Details
The unbiased (squared) dcov is documented in dcovU
, for multivariate data in arbitrary, not necessarily equal dimensions. dcov2d
and dcor2d
provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.
Value
By default, dcov2d
returns the V-statistic V_n = dCov_n^2(x, y)
, and if type="U", it returns the U-statistic, unbiased for dCov^2(X, Y)
. The argument all.stats=TRUE is used internally when the function is called from dcor2d
.
By default, dcor2d
returns dCor_n^2(x, y)
, and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to bcdcor
.
These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.
Note
The U-statistic U_n
can be negative in the lower tail so
the square root of the U-statistic is not applied.
Similarly, dcor2d(x, y, "U")
is bias-corrected and can be
negative in the lower tail, so we do not take the
square root. The original definitions of dCov and dCor
(SRB2007, SR2009) were based on V-statistics, which are non-negative,
and defined using the square root of V-statistics.
It has been suggested that instead of taking the square root of the U-statistic, one could take the root of |U_n|
before applying the sign, but that introduces more bias than the original dCor, and should never be used.
Author(s)
Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely
References
Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.
Szekely, G.J. and Rizzo, M.L. (2014), Partial Distance Correlation with Methods for Dissimilarities. Annals of Statistics, Vol. 42 No. 6, 2382-2412.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
doi:10.1214/009053607000000505
See Also
dcov
dcov.test
dcor
dcor.test
(multivariate statistics and permutation test)
Examples
## these are equivalent, but 2d is faster for n > 50
n <- 100
x <- rnorm(100)
y <- rnorm(100)
all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE)
all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE)
x <- rlnorm(400)
y <- rexp(400)
dcov.test(x, y, R=199) #permutation test
dcor.test(x, y, R=199)