energydist {eummd} | R Documentation |
Naive computation for Energy Distance
Description
Computes energy distance, and possibly a p-value. Suitable for multivariate data. Naive approach, quadratic in number of observations.
Usage
energydist(
X,
Y,
pval = TRUE,
numperm = 200,
seednum = 0,
alternative = c("greater", "two.sided"),
allowzeropval = FALSE
)
Arguments
X |
Matrix (or vector) of observations in first sample. |
Y |
Matrix (or vector) of observations in second sample. |
pval |
Boolean for whether to compute p-value or not. |
numperm |
Number of permutations. Default is |
seednum |
Seed number for generating permutations. Default is |
alternative |
A character string specifying the alternative hypothesis,
which must be either |
allowzeropval |
A boolean, specifying whether we will allow zero
p-values or not. Default is |
Details
First checks number of columns (dimension) are equal.
Suppose matrix X
has n
rows and d
columns,
and matrix Y
has m
rows; checks that Y
has d
columns (if not, then throws error).
Then flattens matrices to vectors (or, if d=1
, they are
already vectors.
Then calls C++ method. If the first sample has n
d
-dimensional samples and the second sample has
m
d
-dimensional samples, then the algorithm
computes the statistic in O((n+m)^2)
time.
Random seed is set for std::mt19937
and std::shuffle
in C++.
Value
A list with the following elements:
pval
The p-value of the test, if it is computed (
pval=TRUE
).stat
The statistic of the test, which is always computed.
References
Baringhaus L. and Franz C. (2004) "On a new multivariate two-sample test." Journal of multivariate analysis 88(1):190-206
Szekely G. J. and Rizzo M. L. (2004) "Testing for equal distributions in high dimension." InterStat 5(16.10):1249-1272
Examples
X <- matrix(c(1:12), ncol=2, byrow=TRUE)
Y <- matrix(c(13:20), ncol=2, byrow=TRUE)
energydistList <- energydist(X=X, Y=Y, pval=FALSE)
#computing p-value
energydistList <- energydist(X=X, Y=Y)
#computing p-value
#using 1000 permutations and seed 1 for reproducibility.
energydistList <- energydist(X=X, Y=Y, numperm=1000, seednum=1)