KdCov {GiniDistance}R Documentation

Kernel Distance Covariance Statistics

Description

Computes Kernel distance covariance statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and returns the measures of dependence.

Usage

  KdCov(x, y, sigma)

Arguments

x

data

y

label of data or univariate response variable

sigma

kernel standard deviation

Details

KdCov compute distance correlation statistics. The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels.

Distance covariance was introduced in (Szekely07) as a dependence measure between random variables XRpX \in {R}^p and YRqY \in {R}^q. If XX and YY are embedded into RKHS's induced by κX\kappa_X and κY\kappa_Y, respectively, the generalized distance covariance of XX and YY is (Sejdinovic13):

dCovκX,κY(X,Y)=EdκX(X,X)dκY(Y,Y)+EdκX(X,X)EdκY(Y,Y)2E[EXdκX(X,X)EYdκY(Y,Y)].\begin{array}{c} \mathrm{dCov}_{\kappa_X,\kappa_Y}(X,Y) = {E}d_{\kappa_X}(X,X^{\prime})d_{\kappa_Y}(Y,Y^{\prime}) + {E}d_{\kappa_X}(X,X^{\prime}){E}d_{\kappa_Y}(Y,Y^{\prime}) \\ - 2{E}\left[{E}_{X^{\prime}}d_{\kappa_X}(X,X^{\prime}) {E}_{Y^{\prime}}d_{\kappa_Y}(Y,Y^{\prime})\right]. \end{array}

In the case of YY being categorical, one may embed it using a set difference kernel κY\kappa_Y,

κY(y,y)={12if  y=y,0otherwise. \kappa_Y(y,y^{\prime}) = \left\{ \begin{array}{cc} \frac{1}{2} & if \;y = y^{\prime},\\ 0 & otherwise. \end{array} \right.

This is equivalent to embedding YY as a simplex with edges of unit length (Lyons13), i.e., LkL_k is represented by a KK dimensional vector of all zeros except its kk-th dimension, which has the value 22\frac{\sqrt{2}}{2}. The distance induced by κY\kappa_Y is called the set distance, i.e., dκY(y,y)=0d_{\kappa_Y}(y,y^{\prime})=0 if y=yy=y^{\prime} and 11 otherwise. Using the set distance, we have the following results on the generalized distance covariance between a numerical and a categorical random variable.

dCovκX,κY(X,Y):=dCovκX(X,Y)=k=1Kpk2[2EdκX(Xk,X)EdκX(Xk,Xk)EdκX(X,X)].\mathrm{dCov}_{\kappa_X,\kappa_Y}(X,Y) := \mathrm{dCov}_{\kappa_X}(X,Y) \nonumber = \sum_{k=1}^{K} p_k^2 \left[2 {E}d_{\kappa_X}(X_k,X) - {E}d_{\kappa_X}(X_k,{X_k}^{\prime}) - {E}d_{\kappa_X}(X,X^{\prime}) \right].

Value

KdCov returns the sample kernel distance correlation

References

Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of Distance-based and RKHS-based Statistics in Hypothesis Testing, The Annals of Statistics, 41 (5), 2263-2291.

Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted).

See Also

KgCov KgCor dCov

Examples

  x<-iris[,1:4]
  y<-unclass(iris[,5])
  KdCov(x, y, sigma=1)

[Package GiniDistance version 0.1.1 Index]