R: Semi-distance independence test

sd_test {semidist}

R Documentation

Semi-distance independence test

Description

Implement the semi-distance independence test via permutation test, or via the asymptotic approximation when the dimensionality of continuous variables p is high.

Usage

sd_test(X, y, test_type = "perm", num_perm = 10000)

Arguments

`X`	Data of multivariate continuous variables, which should be an `n`-by-`p` matrix, or, a vector of length `n` (for univariate variable).
`y`	Data of categorical variables, which should be a factor of length `n`.
`test_type`	Type of the test: `"perm"` (the default): Implement the test via permutation test; `"asym"`: Implement the test via the asymptotic approximation when the dimension of continuous variables `p` is high. See the Reference for details.
`num_perm`	The number of replications in permutation test. Defaults to 10000. See Details and Reference.

Details

The semi-distance independence test statistic is

T_n = n \cdot \widetilde{\text{SDcov}}_n(X, y),

where the \widetilde{\text{SDcov}}_n(X, y) can be computed by sdcov(X, y, type = "U").

For the permutation test (test_type = "perm"), totally K replications of permutation will be conducted, and the argument num_perm specifies the K here. The p-value of permutation test is computed by

\text{p-value} = (\sum_{k=1}^K I(T^{\ast (k)}_{n} \ge T_{n}) + 1) / (K + 1),

where T_{n} is the semi-distance test statistic and T^{\ast (k)}_{n} is the test statistic with k-th permutation sample.

When the dimension of the continuous variables is high, the asymptotic approximation approach can be applied (test_type = "asym"), which is computationally faster since no permutation is needed.

Value

A list with class "indtest" containing the following components

method: name of the test;
name_data: names of the X and y;
n: sample size of the data;
test_type: type of the test;
num_perm: number of replications in permutation test, if test_type = "perm";
stat: test statistic;
pvalue: computed p-value.

Examples

X <- mtcars[, c("mpg", "disp", "drat", "wt")]
y <- factor(mtcars[, "am"])
test <- sd_test(X, y)
print(test)

# Man-made independent data -------------------------------------------------
n <- 30; R <- 5; p <- 3; prob <- rep(1/R, R)
X <- matrix(rnorm(n*p), n, p)
y <- factor(sample(1:R, size = n, replace = TRUE, prob = prob), levels = 1:R)
test <- sd_test(X, y)
print(test)

# Man-made functionally dependent data --------------------------------------
n <- 30; R <- 3; p <- 3
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

#' Man-made high-dimensionally independent data -----------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(rnorm(n*p), n, p)
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)

# Man-made high-dimensionally dependent data --------------------------------
n <- 30; R <- 3; p <- 100
X <- matrix(0, n, p)
X[1:10, 1] <- 1; X[11:20, 2] <- 1; X[21:30, 3] <- 1
y <- factor(rep(1:3, each = 10))
test <- sd_test(X, y)
print(test)

test <- sd_test(X, y, test_type = "asym")
print(test)

[Package semidist version 0.1.0 Index]