sd_test {semidist}R Documentation

Semi-distance independence test

Description

Implement the semi-distance independence test via permutation test, or via the asymptotic approximation when the dimensionality of continuous variables p is high.

Usage

sd_test(X, y, test_type = "perm", num_perm = 10000)

Arguments

X

Data of multivariate continuous variables, which should be an n-by-p matrix, or, a vector of length n (for univariate variable).

y

Data of categorical variables, which should be a factor of length n.

test_type

Type of the test:

  • "perm" (the default): Implement the test via permutation test;

  • "asym": Implement the test via the asymptotic approximation when the dimension of continuous variables p is high.

See the Reference for details.

num_perm

The number of replications in permutation test. Defaults to 10000. See Details and Reference.

Details

The semi-distance independence test statistic is

T_n = n \cdot \widetilde{\text{SDcov}}_n(X, y),

where the \widetilde{\text{SDcov}}_n(X, y) can be computed by sdcov(X, y, type = "U").

For the permutation test (test_type = "perm"), totally K replications of permutation will be conducted, and the argument num_perm specifies the K here. The p-value of permutation test is computed by

\text{p-value} = (\sum_{k=1}^K I(T^{\ast (k)}_{n} \ge T_{n}) + 1) / (K + 1),

where T_{n} is the semi-distance test statistic and T^{\ast (k)}_{n} is the test statistic with k-th permutation sample.

When the dimension of the continuous variables is high, the asymptotic approximation approach can be applied (test_type = "asym"), which is computationally faster since no permutation is needed.

Value

A list with class "indtest" containing the following components

See Also

sdcov() for computing the statistic of semi-distance covariance.

Examples

X <- mtcars[, c("mpg", "disp", "drat", "wt")]
y <- factor(mtcars[, "am"])
test <- sd_test(X, y)
print(test)

# Man-made independent data -------------------------------------------------
n <- 30; R <- 5; p <- 3; prob <- rep(1/R, R)
X <- matrix(rnorm(n*p), n, p)
y <- factor(sample(1:R, size = n, replace = TRUE, prob = prob), levels = 1:R)
test <- sd_test(X, y)
print(test)

# Man-made functionally dependent data --------------------------------------
n <- 30; R <- 3; p <- 3
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

#' Man-made high-dimensionally independent data -----------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(rnorm(n*p), n, p)
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)

# Man-made high-dimensionally dependent data --------------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)


[Package semidist version 0.1.0 Index]