dCor {GiniDistance}R Documentation

Distance Covariance and Correlation Statistics

Description

Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.

Usage

  dCor(x, y, alpha)

Arguments

x

data

y

label of data or univariate response variable

alpha

exponent on Euclidean distance, in (0,2]

Details

The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels.

dCor calls dcor function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of Y is used. That is, d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime), where I (\cdot) is the indicator function. Then the sample distance correlation between data and labels is computed as follows.

Let A=(a_{ij}) be a symmetric, n \times n, centered distance matrix of sample \mathbf x_1,\cdots, \mathbf x_n. The (i,j)-th entry of A is a_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot} if i \neq j and 0 if i=j, where a_{ij} = \|\mathbf x_i-\mathbf x_j\|^{\alpha}, a_{i\cdot} = \sum_{j=1}^n a_{ij}, a_{\cdot j} = \sum_{i=1}^n a_{ij}, and a_{\cdot \cdot}=\sum_{i,j=1}^n a_{ij}. Similarly, using the set difference metric, a symmetric, n \times n, centered distance matrix is calculated for samples y_1,\cdots, y_n and denoted by B = (b_{ij}). Unbiased estimators of \mbox{dCov}(\mathbf X,Y;\alpha), \mbox{dCov}(\mathbf X, \mathbf X;\alpha) and \mbox{dCov}(\mathbf Y, \mathbf Y;\alpha) are given respectively as, \frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}B_{ij}, \frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}^2 and \frac{1}{n(n-3)}\sum_{i\ne j}B_{ij}^2. Then the distance correlation is

{dCor}(\mathbf{X}, Y; \alpha) = \frac{\mbox{ dCov}(\mathbf{X}, Y, \alpha)}{ \sqrt{\mbox{ dCov}(\mathbf{X},\mathbf{X};\alpha)} \sqrt{\mbox{ dCov}(Y,Y)}}.

Value

dCor returns the sample distance variance of x, distance variance of y, distance covariance of x and y and distance correlation of x, y.

References

Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.

Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.

Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.

See Also

dCov KdCov KdCor

Examples

  x <- iris[,1:4]
  y <- unclass(iris[,5])
  dCor(x, y, alpha = 1)

[Package GiniDistance version 0.1.1 Index]