sdcov {semidist} | R Documentation |
Semi-distance covariance and correlation statistics
Description
Compute the statistics (or sample estimates) of semi-distance covariance and correlation. The semi-distance correlation is a standardized version of semi-distance covariance, and it can measure the dependence between a multivariate continuous variable and a categorical variable. See Details for the definition of semi-distance covariance and semi-distance correlation.
Usage
sdcov(X, y, type = "V", return_mat = FALSE)
sdcor(X, y)
Arguments
X |
Data of multivariate continuous variables, which should be an
|
y |
Data of categorical variables, which should be a factor of length
|
type |
Type of statistic: |
return_mat |
A boolean. If |
Details
For and
, the (population-level) semi-distance covariance is defined as
where and
is an iid copy of
.
The (population-level) semi-distance correlation is defined as
where is
the distance variance (Szekely, Rizzo, and Bakirov 2007) of
.
With observations
,
sdcov()
and sdcor()
can compute the sample estimates for the semi-distance
covariance and correlation.
If type = "V"
, the semi-distance covariance statistic is computed as a
V-statistic, which takes a very similar form as the energy-based statistic
with double centering, and is always non-negative. Specifically,
where
is the double centering (Szekely, Rizzo, and Bakirov 2007) of
and
with .
The semi-distance correlation statistic is
where is the V-statistic of distance variance
of
.
If type = "U"
, then the semi-distance covariance statistic is computed as
an “estimated U-statistic”, which is utilized in the independence test
statistic and is not necessarily non-negative. Specifically,
where . Note that the test statistic of the semi-distance independence
test is
Value
The value of the corresponding sample statistic.
If the argument return_mat
of sdcov()
is set as TRUE
, a list with
elements
-
sdcov
: the semi-distance covariance statistic; -
mat_x, mat_y
: the matrices of the distances of X and the divergences of y, respectively;
will be returned.
See Also
-
sd_test()
for implementing independence test via semi-distance covariance; -
sd_sis()
for implementing groupwise feature screening via semi-distance correlation.
Examples
X <- mtcars[, c("mpg", "disp", "drat", "wt")]
y <- factor(mtcars[, "am"])
print(sdcov(X, y))
print(sdcor(X, y))
# Man-made independent data -------------------------------------------------
n <- 30; R <- 5; p <- 3; prob <- rep(1/R, R)
X <- matrix(rnorm(n*p), n, p)
y <- factor(sample(1:R, size = n, replace = TRUE, prob = prob), levels = 1:R)
print(sdcov(X, y))
print(sdcor(X, y))
# Man-made functionally dependent data --------------------------------------
n <- 30; R <- 3; p <- 3
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
print(sdcov(X, y))
print(sdcor(X, y))