eqdist.etest {energy} | R Documentation |

Performs the nonparametric multisample E-statistic (energy) test for equality of multivariate distributions.

```
eqdist.etest(x, sizes, distance = FALSE,
method=c("original","discoB","discoF"), R)
eqdist.e(x, sizes, distance = FALSE,
method=c("original","discoB","discoF"))
ksample.e(x, sizes, distance = FALSE,
method=c("original","discoB","discoF"), ix = 1:sum(sizes))
```

`x` |
data matrix of pooled sample |

`sizes` |
vector of sample sizes |

`distance` |
logical: if TRUE, first argument is a distance matrix |

`method` |
use original (default) or distance components (discoB, discoF) |

`R` |
number of bootstrap replicates |

`ix` |
a permutation of the row indices of x |

The k-sample multivariate `\mathcal{E}`

-test of equal distributions
is performed. The statistic is computed from the original
pooled samples, stacked in matrix `x`

where each row
is a multivariate observation, or the corresponding distance matrix. The
first `sizes[1]`

rows of `x`

are the first sample, the next
`sizes[2]`

rows of `x`

are the second sample, etc.

The test is implemented by nonparametric bootstrap, an approximate
permutation test with `R`

replicates.

The function `eqdist.e`

returns the test statistic only; it simply
passes the arguments through to `eqdist.etest`

with `R = 0`

.

The k-sample multivariate `\mathcal{E}`

-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix `x`

where each row is a multivariate observation, or from the distance
matrix `x`

of the original data. The
first `sizes[1]`

rows of `x`

are the first sample, the next
`sizes[2]`

rows of `x`

are the second sample, etc.

The two-sample `\mathcal{E}`

-statistic proposed by
Szekely and Rizzo (2004)
is the e-distance `e(S_i,S_j)`

, defined for two samples `S_i, S_j`

of size `n_i, n_j`

by

```
e(S_i,S_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],
```

where

```
M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j}
\|X_{ip}-X_{jq}\|,
```

`\|\cdot\|`

denotes Euclidean norm, and `X_{ip}`

denotes the p-th observation in the i-th sample.

The original (default method) k-sample
`\mathcal{E}`

-statistic is defined by summing the pairwise e-distances over
all `k(k-1)/2`

pairs
of samples:

```
\mathcal{E}=\sum_{1 \leq i < j \leq k} e(S_i,S_j).
```

Large values of `\mathcal{E}`

are significant.

The `discoB`

method computes the between-sample disco statistic.
For a one-way analysis, it is related to the original statistic as follows.
In the above equation, the weights `\frac{n_i n_j}{n_i+n_j}`

are replaced with

```
\frac{n_i + n_j}{2N}\frac{n_i n_j}{n_i+n_j} =
\frac{n_i n_j}{2N}
```

where N is the total number of observations: `N=n_1+...+n_k`

.

The `discoF`

method is based on the disco F ratio, while the `discoB`

method is based on the between sample component.

Also see `disco`

and `disco.between`

functions.

A list with class `htest`

containing

`method` |
description of test |

`statistic` |
observed value of the test statistic |

`p.value` |
approximate p-value of the test |

`data.name` |
description of data |

`eqdist.e`

returns test statistic only.

The pairwise e-distances between samples can be conveniently
computed by the `edist`

function, which returns a `dist`

object.

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal
Distributions in High Dimension, *InterStat*, November (5).

M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.

doi: 10.1214/09-AOAS245

Szekely, G. J. (2000) Technical Report 03-05:
`\mathcal{E}`

-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics, Bowling
Green State University.

`ksample.e`

,
`edist`

,
`disco`

,
`disco.between`

,
`energy.hclust`

.

```
data(iris)
## test if the 3 varieties of iris data (d=4) have equal distributions
eqdist.etest(iris[,1:4], c(50,50,50), R = 199)
## example that uses method="disco"
x <- matrix(rnorm(100), nrow=20)
y <- matrix(rnorm(100), nrow=20)
X <- rbind(x, y)
d <- dist(X)
# should match edist default statistic
set.seed(1234)
eqdist.etest(d, sizes=c(20, 20), distance=TRUE, R = 199)
# comparison with edist
edist(d, sizes=c(20, 10), distance=TRUE)
# for comparison
g <- as.factor(rep(1:2, c(20, 20)))
set.seed(1234)
disco(d, factors=g, distance=TRUE, R=199)
# should match statistic in edist method="discoB", above
set.seed(1234)
disco.between(d, factors=g, distance=TRUE, R=199)
```

[Package *energy* version 1.7-10 Index]