gCor {GiniDistance} | R Documentation |
Gini Distance Covariance and Correlation Statistics
Description
Computes Gini distance covariance and correlation statistics, in which Xs are quantitative, Y are categorical, alpha is exponent on the Euclidean distance and returns the measures of dependence.
Usage
gCor(x, y, alpha)
Arguments
x |
data |
y |
label of data or univariate response variable |
alpha |
exponent on Euclidean distance, in (0,2) |
Details
gCor
compute Gini distance correlation statistics.
It is a self-contained R function returning a measure of dependence statistics.
The sample size (number of rows) of the data must agree with the length of the label vector, and samples must not contain missing values. Arguments
x
, y
are treated as data and labels. alpha
if missing by default is 1, otherwise it is exponent on the Euclidean distance.
Suppose a sample data {\mathcal{D}} =\{(\mathbf{x}_i,y_i)\}
for i = 1,...,n
available. The sample counterparts can be easily computed. Let {\mathcal{I}}_k
be the index set of sample points with y_i =L_k
, then p_k
is estimated by the sample proportion of that category, that is, \hat{p}_k= \frac{n_k}{n}
where n_k
is the number of elements in {\mathcal{I}}_k
. With a given \alpha \in (0,2)
, a point estimator of \rho_g(\alpha)
is given as follows.
\hat{\Delta}_k(\alpha)= {n_k \choose 2}^{-1} \sum_{i<j \in {\mathcal{I}}_k} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},
\hat{\Delta}(\alpha)={n \choose 2}^{-1} \sum_{1=i<j=n} \|\mathbf{x}_i -\mathbf{x}_j\| ^{\alpha},
gCor=\hat{\rho}_g (\alpha)= 1-\frac{\sum_{k=1}^K \hat p_k \hat{\Delta}_k(\alpha)}{\hat{\Delta}(\alpha)}.
Value
gCor
returns the sample Gini distance covariacne and correlation between x
and y
.
References
Dang, X., Nguyen, D., Chen, Y. and Zhang, J. (2019). A new Gini correlation between quantitative and qualitative variables. Submitted to Journal of American Statistics Association.
See Also
Examples
x <- iris[,1:4]
y <- unclass(iris[,5])
gCor(x, y, alpha = 1)