KdCov {GiniDistance} | R Documentation |
Kernel Distance Covariance Statistics
Description
Computes Kernel distance covariance statistics, in which Xs are quantitative, Y are categorical, sigma is kernel standard deviation and returns the measures of dependence.
Usage
KdCov(x, y, sigma)
Arguments
x |
data |
y |
label of data or univariate response variable |
sigma |
kernel standard deviation |
Details
KdCov
compute distance correlation statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels.
Distance covariance was introduced in (Szekely07) as a dependence measure between random variables and
. If
and
are embedded into RKHS's induced by
and
, respectively, the generalized distance covariance of
and
is (Sejdinovic13):
In the case of being categorical, one may embed it using a set difference kernel
,
This is equivalent to embedding as a simplex with edges of unit length (Lyons13), i.e.,
is represented by a
dimensional vector of all zeros except its
-th dimension, which has the value
.
The distance induced by
is called the set distance, i.e.,
if
and
otherwise. Using the set distance, we have the following results on the generalized distance covariance between a numerical
and a categorical random variable.
Value
KdCov
returns the sample kernel distance correlation
References
Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of Distance-based and RKHS-based Statistics in Hypothesis Testing, The Annals of Statistics, 41 (5), 2263-2291.
Zhang, S., Dang, X., Nguyen, D. and Chen, Y. (2019). Estimating feature - label dependence using Gini distance statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted).
See Also
Examples
x<-iris[,1:4]
y<-unclass(iris[,5])
KdCov(x, y, sigma=1)