edist {energy} | R Documentation |

Returns the E-distances (energy statistics) between clusters.

```
edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1,
method = c("cluster","discoB"))
```

`x` |
data matrix of pooled sample or Euclidean distances |

`sizes` |
vector of sample sizes |

`distance` |
logical: if TRUE, x is a distance matrix |

`ix` |
a permutation of the row indices of x |

`alpha` |
distance exponent in (0,2] |

`method` |
how to weight the statistics |

A vector containing the pairwise two-sample multivariate
`\mathcal{E}`

-statistics for comparing clusters or samples is returned.
The e-distance between clusters is computed from the original pooled data,
stacked in matrix `x`

where each row is a multivariate observation, or
from the distance matrix `x`

of the original data, or distance object
returned by `dist`

. The first `sizes[1]`

rows of the original data
matrix are the first sample, the next `sizes[2]`

rows are the second
sample, etc. The permutation vector `ix`

may be used to obtain
e-distances corresponding to a clustering solution at a given level in
the hierarchy.

The default method `cluster`

summarizes the e-distances between
clusters in a table.
The e-distance between two clusters `C_i, C_j`

of size `n_i, n_j`

proposed by Szekely and Rizzo (2005)
is the e-distance `e(C_i,C_j)`

, defined by

```
e(C_i,C_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}-M_{ii}-M_{jj}],
```

where

```
M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j}
\|X_{ip}-X_{jq}\|^\alpha,
```

`\|\cdot\|`

denotes Euclidean norm, `\alpha=`

`alpha`

, and `X_{ip}`

denotes the p-th observation in the i-th cluster. The
exponent `alpha`

should be in the interval (0,2].

The coefficient `\frac{n_i n_j}{n_i+n_j}`

is one-half of the harmonic mean of the sample sizes. The
`discoB`

method is related but with
different ways of summarizing the pairwise differences between samples.
The `disco`

methods apply the coefficient
`\frac{n_i n_j}{2N}`

where N is the total number
of observations. This weights each (i,j) statistic by sample size
relative to N. See the `disco`

topic for more details.

A object of class `dist`

containing the lower triangle of the
e-distance matrix of cluster distances corresponding to the permutation
of indices `ix`

is returned. The `method`

attribute of the
distance object is assigned a value of type, index.

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering
via Joint Between-Within Distances: Extending Ward's Minimum
Variance Method, *Journal of Classification* 22(2) 151-183.

doi: 10.1007/s00357-005-0012-9

M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.

doi: 10.1214/09-AOAS245

Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).

Szekely, G. J. (2000) Technical Report 03-05,
`\mathcal{E}`

-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics,
Bowling Green State University.

`energy.hclust`

`eqdist.etest`

`ksample.e`

`disco`

```
## compute cluster e-distances for 3 samples of iris data
data(iris)
edist(iris[,1:4], c(50,50,50))
## pairwise disco statistics
edist(iris[,1:4], c(50,50,50), method="discoB")
## compute e-distances from a distance object
data(iris)
edist(dist(iris[,1:4]), c(50, 50, 50), distance=TRUE, alpha = 1)
## compute e-distances from a distance matrix
data(iris)
d <- as.matrix(dist(iris[,1:4]))
edist(d, c(50, 50, 50), distance=TRUE, alpha = 1)
```

[Package *energy* version 1.7-10 Index]