dhsic.test {dHSIC}  R Documentation 
Independence test based on dHSIC
Description
Hypothesis test for finding statistically significant evidence of dependence between several variables. Uses the dvariable Hilbert Schmidt independence criterion (dHSIC) as measure of dependence. Several types of hypothesis tests are included. The null hypothesis (H_0) is that all variables are jointly independent.
Usage
dhsic.test(X, Y, K, alpha = 0.05, method = "permutation",
kernel = "gaussian", B = 1000, pairwise = FALSE,
bandwidth = 1, matrix.input = FALSE)
Arguments
X 
either a list of at least two numeric matrices or a single numeric
matrix. The rows of a matrix correspond to the observations of a
variable. It is always required that there are an equal number of
observations for all variables (i.e. all matrices have to have the
same number of rows). If 
Y 
a numeric matrix if 
K 
a list of the gram matrices corresponding to each variable. If

alpha 
a numeric value in (0,1) specifying the confidence level of the hypothesis test. 
method 
a character string specifying the type of hypothesis test used. The available options are: "gamma" (gamma approximation based test), "permutation" (permutation test (slow)), "bootstrap" (bootstrap test (slow)) and "eigenvalue" (eigenvalue based test). 
kernel 
a vector of character strings specifying the kernels for each
variable. There exist two predefined kernels: "gaussian" (Gaussian kernel
with median heuristic as bandwidth) and "discrete" (discrete
kernel). User defined kernels can also be used by passing the
function name as a string, which will then be matched using

B 
an integer value specifying the number of MonteCarlo iterations
made in the permutation and bootstrap test. Only relevant if

pairwise 
a logical value indicating whether one should use HSIC with pairwise comparisons instead of dHSIC. Can only be true if there are more than two variables. 
bandwidth 
a numeric value specifying the size of the bandwidth used for the Gaussian kernel. Only used if kernel="gaussian.fixed". 
matrix.input 
a boolean. If 
Details
The dvariable Hilbert Schmidt independence criterion is a direct extension of the standard Hilbert Schmidt independence criterion (HSIC) from two variables to an arbitrary number of variables. It is 0 if and only if the variables are jointly independent.
4 different statistical hypothesis tests are implemented all with null hypothesis
(H_0: X[[1]]
,...,X[[d]]
are jointly independent) and alternative hypothesis
(H_A: X[[1]]
,...,X[[d]]
are not jointly independent):
1. Permutation test for dHSIC: exact level, slow
2. Bootstrap test for dHSIC: pointwise asymptotic level and pointwise
consistent, slow
3. Gamma approximation based test for dHSIC: only approximate, fast
4. Eigenvalue based test for dHSIC: pointwise asymptotic level and pointwise
consistent, medium
The null hypothesis is rejected if statistic
is strictly
greater than crit.value
.
If X
is a list with d matrices, the function tests for joint
independence of the corresponding d random vectors. If X
is a
matrix and matrix.input
is "TRUE" the functions tests the
independence between the columns of X
. If X
is a matrix
and matrix.input
is "FALSE" then Y
needs to be a matrix,
too; in this case, the function tests the (pairwise) independence
between the corresponding two random vectors.
For more details see the references.
Value
A list containing the following components:
statistic 
the value of the test statistic 
crit.value 
critical value of the hypothesis test. The null
hypothesis (H_0: joint independence) is rejected if 
p.value 
pvalue of the hypothesis test, i.e. the probability that
a random version of the test statistic is greater than

time 
numeric vector containing computation times. 
bandwidth 
bandwidth used during the computation. Only relevant if Gaussian kernel was used. 
Author(s)
Niklas Pfister and Jonas Peters
References
Gretton, A., K. Fukumizu, C. H. Teo, L. Song, B. Sch\"olkopf and A. J. Smola (2007). A kernel statistical test of independence. In Advances in Neural Information Processing Systems (pp. 585592).
Pfister, N., P. B\"uhlmann, B. Sch\"olkopf and J. Peters (2017). Kernelbased Tests for Joint Independence. To appear in the Journal of the Royal Statistical Society, Series B.
See Also
In order to only compute the test statistic without pvalues, use the
function dhsic
.
Examples
### pairwise independent but not jointly independent (pairwise HSIC vs dHSIC)
set.seed(0)
x < matrix(rbinom(100,1,0.5),ncol=1)
y < matrix(rbinom(100,1,0.5),ncol=1)
z < matrix(as.numeric((x+y)==1)+rnorm(100),ncol=1)
X < list(x,y,z)
dhsic.test(X, method="permutation",
kernel=c("discrete", "discrete", "gaussian"),
pairwise=TRUE, B=1000)$p.value
dhsic.test(X, method="permutation",
kernel=c("discrete", "discrete", "gaussian"),
pairwise=FALSE, B=1000)$p.value