gCov {GiniDistance}R Documentation

Gini Distance Covariance Statistics

Description

Computes Gini distance covariance statistics, in which Xs are quantitative, Y are categorical, alpha is an exponent on Euclidean distance and returns the measures of dependence.

Usage

  gCov(x, y, alpha)

Arguments

x

data

y

label of data or univariate response variable

alpha

exponent on Euclidean distance, in (0,2]

Details

gCov compute Gini distance covariance statistics. It is a self-contained R function returning a measure of dependence statistics.

The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments x, y are treated as data and labels. alpha if missing by default is 1, otherwise it is exponent on the Euclidean distance.

Gini distance covariance is a new measure of dependence between random vectors and its labels. For all distributions with finite first moments, Gini distance correlation gCov has the following fundamental properties:

(1) gCov(X,Y) is defined for X in arbitrary dimension quantitive variable and Y a univariate categorical variable.

(2) gCov(X,Y)=0 characterizes independence of X and Y.

Gini distance covariance satisfies 0 \le gCov(X,Y), and gCov = 0 only if X and Y are independent. Gini distance covariance gCov provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients gCov is given in (DNCZ 2018). The empirical Gini distance covariance gCov_n(X,Y; alpha) is the nonnegative number computed as follows.

Suppose a sample data {\mathcal{D}} =\{(\mathbf{x}_i,y_i)\} for i = 1,...,n available. The sample counterparts can be easily computed. Let {\mathcal{I}}_k be the index set of sample points with y_i =L_k, then p_k is estimated by the sample proportion of that category, that is, \hat{p}_k= \frac{n_k}{n} where n_k is the number of elements in {\mathcal{I}}_k. With a given \alpha \in (0,2), a point estimator of \rho_g(\alpha) is given as follows.

\hat{\Delta}_k(\alpha)= {n_k \choose 2}^{-1} \sum_{i<j \in {\mathcal{I}}_k} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},

\hat{\Delta}(\alpha)={n \choose 2}^{-1} \sum_{1=i<j=n} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},

{gCov}= \hat{\Delta}(\alpha)-\sum_{k=1}^K \hat p_k \hat{\Delta}_k(\alpha).

Value

gCov returns the sample Gini distance covariance

References

Dang, X., Nguyen, D., Chen, Y. and Zhang, J., (2019). A new Gini correlation between quantitative and qualitative variables, Journal of the American Statistical Association (submitted), https://arxiv.org/pdf/1809.09793.pdf

See Also

gCor gmd KgCov KgCor

Examples

  x <- iris[,1:4]
  y <- unclass(iris[,5])
  gCov(x, y, alpha = 1) 

[Package GiniDistance version 0.1.1 Index]