dCor {GiniDistance}R Documentation

Distance Covariance and Correlation Statistics

Description

Computes distance covariance and correlation statistics, in which Xs are quantitative and Ys are categorical and return the measures of dependence.

Usage

  dCor(x, y, alpha)

Arguments

x

data

y

label of data or univariate response variable

alpha

exponent on Euclidean distance, in (0,2]

Details

The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels.

dCor calls dcor function from energy package which computes the distance correlation between X and Y where both are numerical variables. If Y is categorical, the set difference metric on the support of YY is used. That is, d(y,y)=yy:=I(yy),d(y, y^\prime) =|y-y^\prime|:= I(y\neq y^\prime), where I()I (\cdot) is the indicator function. Then the sample distance correlation between data and labels is computed as follows.

Let A=(aij)A=(a_{ij}) be a symmetric, n×nn \times n, centered distance matrix of sample x1,,xn\mathbf x_1,\cdots, \mathbf x_n. The (i,j)(i,j)-th entry of AA is aij1n2ai1n2aj+1(n1)(n2)aa_{ij}-\frac{1}{n-2}a_{i\cdot}-\frac{1}{n-2}a_{\cdot j} + \frac{1}{(n-1)(n-2)}a_{\cdot \cdot} if iji \neq j and 0 if i=ji=j, where aij=xixjαa_{ij} = \|\mathbf x_i-\mathbf x_j\|^{\alpha}, ai=j=1naija_{i\cdot} = \sum_{j=1}^n a_{ij}, aj=i=1naija_{\cdot j} = \sum_{i=1}^n a_{ij}, and a=i,j=1naija_{\cdot \cdot}=\sum_{i,j=1}^n a_{ij}. Similarly, using the set difference metric, a symmetric, n×nn \times n, centered distance matrix is calculated for samples y1,,yny_1,\cdots, y_n and denoted by B=(bij)B = (b_{ij}). Unbiased estimators of \mboxdCov(X,Y;α)\mbox{dCov}(\mathbf X,Y;\alpha), \mboxdCov(X,X;α)\mbox{dCov}(\mathbf X, \mathbf X;\alpha) and \mboxdCov(Y,Y;α)\mbox{dCov}(\mathbf Y, \mathbf Y;\alpha) are given respectively as, 1n(n3)ijAijBij\frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}B_{ij}, 1n(n3)ijAij2\frac{1}{n(n-3)}\sum_{i\ne j}A_{ij}^2 and 1n(n3)ijBij2\frac{1}{n(n-3)}\sum_{i\ne j}B_{ij}^2. Then the distance correlation is

dCor(X,Y;α)=\mboxdCov(X,Y,α)\mboxdCov(X,X;α)\mboxdCov(Y,Y).{dCor}(\mathbf{X}, Y; \alpha) = \frac{\mbox{ dCov}(\mathbf{X}, Y, \alpha)}{ \sqrt{\mbox{ dCov}(\mathbf{X},\mathbf{X};\alpha)} \sqrt{\mbox{ dCov}(Y,Y)}}.

Value

dCor returns the sample distance variance of x, distance variance of y, distance covariance of x and y and distance correlation of x, y.

References

Lyons, R. (2013). Distance covariance in metric spaces. The Annals of Probability, 41 (5), 3284-3305.

Szekely, G. J., Rizzo, M. L. and Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. Annals of Statistics, 35 (6), 2769-2794.

Rizzo, M.L. and Szekely, G.J., (2017). Energy: E-Statistics: Multivariate Inference via the Energy of Data (R Package), Version 1.7-0.

See Also

dCov KdCov KdCor

Examples

  x <- iris[,1:4]
  y <- unclass(iris[,5])
  dCor(x, y, alpha = 1)

[Package GiniDistance version 0.1.1 Index]