mmd {eummd} | R Documentation |
Naive computation for Maximum Mean Discrepancy
Description
Computes maximum mean discrepancy statistics with Laplacian or Gaussian kernel. Suitable for multivariate data. Naive approach, quadratic in number of observations.
Usage
mmd(
X,
Y,
beta = -0.1,
pval = TRUE,
kernel = c("Laplacian", "Gaussian"),
numperm = 200,
seednum = 0,
alternative = c("greater", "two.sided"),
allowzeropval = FALSE
)
Arguments
X |
Matrix (or vector) of observations in first sample. |
Y |
Matrix (or vector) of observations in second sample. |
beta |
kernel parameter. Must be positive; if not, computes
median heuristic in quadratic time. Default value
is |
pval |
Boolean for whether to compute p-value or not. |
kernel |
String, either |
numperm |
Number of permutations. Default is |
seednum |
Seed number for generating permutations. Default is |
alternative |
A character string specifying the alternative hypothesis,
which must be either |
allowzeropval |
A boolean, specifying whether we will allow zero
p-values or not. Default is |
Details
First checks number of columns (dimension) are equal.
Suppose matrix X
has n
rows and d
columns,
and matrix Y
has m
rows; checks that Y
has d
columns (if not, then throws error).
Then flattens matrices to vectors (or, if d=1
, they are
already vectors.
Then calls C++ method. If the first sample has n
d
-dimensional samples and the second sample has
m
d
-dimensional samples, then the algorithm
computes the statistic in O((n+m)^2)
time.
Median difference is as follows:
m = \textnormal{median} \{ || x_i - x_j ||_1; \,\, i>j, \,\,
i=1, 2,\dots, n+m,\,\,\textnormal{ and } j=1, 2,\dots, i-1 \},
where || x_i - x_j ||_1
is the 1-norm, and so if the data
are d
-dimensional then
|| x_i - x_j ||_1 = \sum_{k=1}^{d} |x_{i,k} - x_{j,k}|,
and finally median heuristic is beta = 1/m
.
This can be computed in O( (n+m)^2 )
time.
The Laplacian kernel k
is defined as
k(x,y) = \exp( -\beta || x_i - x_j ||_1 ).
Random seed is set for std::mt19937
and std::shuffle
in C++.
Value
A list with the following elements:
pval
The p-value of the test, if it is computed (
pval=TRUE
).stat
The statistic of the test, which is always computed.
beta
The kernel parameter used in the test. If
beta
was not initialised or negative, this will be the median heuristic value.
References
Gretton, A., Borgwardt, K. M., Rasch M. J., Schölkopf, B. and Smola, A. (2012) "A kernel two-sample test." The Journal of Machine Learning Research 13, no. 1, 723-773.
Examples
X <- matrix(c(1:12), ncol=2, byrow=TRUE)
Y <- matrix(c(13:20), ncol=2, byrow=TRUE)
mmdList <- mmd(X=X, Y=Y, beta=0.1, pval=FALSE)
#using median heuristic
mmdList <- mmd(X=X, Y=Y, pval=FALSE)
#using median heuristic and computing p-value
mmdList <- mmd(X=X, Y=Y)
#using median heuristic and computing p-value
#using 1000 permutations and seed 1 for reproducibility.
mmdList <- mmd(X=X, Y=Y, numperm=1000, seednum=1)