dhsic.test {dHSIC} | R Documentation |
Independence test based on dHSIC
Description
Hypothesis test for finding statistically significant evidence of dependence between several variables. Uses the d-variable Hilbert Schmidt independence criterion (dHSIC) as measure of dependence. Several types of hypothesis tests are included. The null hypothesis (H_0) is that all variables are jointly independent.
Usage
dhsic.test(X, Y, K, alpha = 0.05, method = "permutation",
kernel = "gaussian", B = 1000, pairwise = FALSE,
bandwidth = 1, matrix.input = FALSE)
Arguments
X |
either a list of at least two numeric matrices or a single numeric
matrix. The rows of a matrix correspond to the observations of a
variable. It is always required that there are an equal number of
observations for all variables (i.e. all matrices have to have the
same number of rows). If |
Y |
a numeric matrix if |
K |
a list of the gram matrices corresponding to each variable. If
|
alpha |
a numeric value in (0,1) specifying the confidence level of the hypothesis test. |
method |
a character string specifying the type of hypothesis test used. The available options are: "gamma" (gamma approximation based test), "permutation" (permutation test (slow)), "bootstrap" (bootstrap test (slow)) and "eigenvalue" (eigenvalue based test). |
kernel |
a vector of character strings specifying the kernels for each
variable. There exist two pre-defined kernels: "gaussian" (Gaussian kernel
with median heuristic as bandwidth) and "discrete" (discrete
kernel). User defined kernels can also be used by passing the
function name as a string, which will then be matched using
|
B |
an integer value specifying the number of Monte-Carlo iterations
made in the permutation and bootstrap test. Only relevant if
|
pairwise |
a logical value indicating whether one should use HSIC with pairwise comparisons instead of dHSIC. Can only be true if there are more than two variables. |
bandwidth |
a numeric value specifying the size of the bandwidth used for the Gaussian kernel. Only used if kernel="gaussian.fixed". |
matrix.input |
a boolean. If |
Details
The d-variable Hilbert Schmidt independence criterion is a direct extension of the standard Hilbert Schmidt independence criterion (HSIC) from two variables to an arbitrary number of variables. It is 0 if and only if the variables are jointly independent.
4 different statistical hypothesis tests are implemented all with null hypothesis
(H_0: X[[1]]
,...,X[[d]]
are jointly independent) and alternative hypothesis
(H_A: X[[1]]
,...,X[[d]]
are not jointly independent):
1. Permutation test for dHSIC: exact level, slow
2. Bootstrap test for dHSIC: pointwise asymptotic level and pointwise
consistent, slow
3. Gamma approximation based test for dHSIC: only approximate, fast
4. Eigenvalue based test for dHSIC: pointwise asymptotic level and pointwise
consistent, medium
The null hypothesis is rejected if statistic
is strictly
greater than crit.value
.
If X
is a list with d matrices, the function tests for joint
independence of the corresponding d random vectors. If X
is a
matrix and matrix.input
is "TRUE" the functions tests the
independence between the columns of X
. If X
is a matrix
and matrix.input
is "FALSE" then Y
needs to be a matrix,
too; in this case, the function tests the (pairwise) independence
between the corresponding two random vectors.
For more details see the references.
Value
A list containing the following components:
statistic |
the value of the test statistic |
crit.value |
critical value of the hypothesis test. The null
hypothesis (H_0: joint independence) is rejected if |
p.value |
p-value of the hypothesis test, i.e. the probability that
a random version of the test statistic is greater than
|
time |
numeric vector containing computation times. |
bandwidth |
bandwidth used during the computation. Only relevant if Gaussian kernel was used. |
Author(s)
Niklas Pfister and Jonas Peters
References
Gretton, A., K. Fukumizu, C. H. Teo, L. Song, B. Sch\"olkopf and A. J. Smola (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (pp. 585-592).
Pfister, N., P. B\"uhlmann, B. Sch\"olkopf and J. Peters (2017). Kernel-based Tests for Joint Independence. To appear in the Journal of the Royal Statistical Society, Series B.
See Also
In order to only compute the test statistic without p-values, use the
function dhsic
.
Examples
### pairwise independent but not jointly independent (pairwise HSIC vs dHSIC)
set.seed(0)
x <- matrix(rbinom(100,1,0.5),ncol=1)
y <- matrix(rbinom(100,1,0.5),ncol=1)
z <- matrix(as.numeric((x+y)==1)+rnorm(100),ncol=1)
X <- list(x,y,z)
dhsic.test(X, method="permutation",
kernel=c("discrete", "discrete", "gaussian"),
pairwise=TRUE, B=1000)$p.value
dhsic.test(X, method="permutation",
kernel=c("discrete", "discrete", "gaussian"),
pairwise=FALSE, B=1000)$p.value