| gCor {GiniDistance} | R Documentation |
Gini Distance Covariance and Correlation Statistics
Description
Computes Gini distance covariance and correlation statistics, in which Xs are quantitative, Y are categorical, alpha is exponent on the Euclidean distance and returns the measures of dependence.
Usage
gCor(x, y, alpha)
Arguments
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2) |
Details
gCor compute Gini distance correlation statistics.
It is a self-contained R function returning a measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x, y are treated as data and labels. alpha if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Suppose a sample data {\mathcal{D}} =\{(\mathbf{x}_i,y_i)\} for i = 1,...,n available. The sample counterparts can be easily computed. Let {\mathcal{I}}_k be the index set of sample points with y_i =L_k, then p_k is estimated by the sample proportion of that category, that is, \hat{p}_k= \frac{n_k}{n} where n_k is the number of elements in {\mathcal{I}}_k. With a given \alpha \in (0,2), a point estimator of \rho_g(\alpha) is given as follows.
\hat{\Delta}_k(\alpha)= {n_k \choose 2}^{-1} \sum_{i<j \in {\mathcal{I}}_k} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},
\hat{\Delta}(\alpha)={n \choose 2}^{-1} \sum_{1=i<j=n} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},
gCor=\hat{\rho}_g (\alpha)= 1-\frac{\sum_{k=1}^K \hat p_k \hat{\Delta}_k(\alpha)}{\hat{\Delta}(\alpha)}.
Value
gCor returns the sample Gini distance covariacne and correlation between x and y.
References
Dang, X., Nguyen, D., Chen, Y. and Zhang, J. (2019). A new Gini correlation between quantitative and qualitative variables. Submitted to Journal of American Statistics Association.
See Also
Examples
x <- iris[,1:4]
y <- unclass(iris[,5])
gCor(x, y, alpha = 1)