dCor {GiniDistance} | R Documentation |
Distance Covariance and Correlation Statistics
Description
Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.
Usage
dCor(x, y, alpha)
Arguments
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2] |
Details
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
dCor
calls dcor
function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of Y
is used. That is, d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime),
where I (\cdot)
is the indicator function. Then the sample distance correlation between data and labels is computed as follows.
Let A=(a_{ij})
be a symmetric, n \times n
, centered distance matrix of sample \mathbf x_1,\cdots, \mathbf x_n
. The (i,j)
-th entry of A
is a_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot}
if i \neq j
and 0 if i=j
,
where a_{ij} = \|\mathbf x_i-\mathbf x_j\|^{\alpha}
, a_{i\cdot} = \sum_{j=1}^n a_{ij}
, a_{\cdot j} = \sum_{i=1}^n a_{ij}
, and a_{\cdot \cdot}=\sum_{i,j=1}^n a_{ij}
. Similarly, using the set difference metric, a symmetric, n \times n
, centered distance matrix is calculated for samples y_1,\cdots, y_n
and denoted by B = (b_{ij})
. Unbiased estimators of \mbox{dCov}(\mathbf X,Y;\alpha)
, \mbox{dCov}(\mathbf X, \mathbf X;\alpha)
and \mbox{dCov}(\mathbf Y, \mathbf Y;\alpha)
are given respectively as, \frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}B_{ij}
, \frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}^2
and \frac{1}{n(n-3)}\sum_{i\ne j}B_{ij}^2
. Then the distance correlation is
{dCor}(\mathbf{X}, Y; \alpha) = \frac{\mbox{ dCov}(\mathbf{X}, Y, \alpha)}{ \sqrt{\mbox{ dCov}(\mathbf{X},\mathbf{X};\alpha)} \sqrt{\mbox{ dCov}(Y,Y)}}.
Value
dCor
returns the sample distance variance of x
, distance variance of y
, distance covariance of x
and y
and distance correlation of x
, y
.
References
Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.
Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.
Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.
See Also
Examples
x <- iris[,1:4]
y <- unclass(iris[,5])
dCor(x, y, alpha = 1)