edist {energy} | R Documentation |
E-distance
Description
Returns the E-distances (energy statistics) between clusters.
Usage
edist(x, sizes, distance = FALSE, ix = 1:sum(sizes), alpha = 1,
method = c("cluster","discoB"))
Arguments
x |
data matrix of pooled sample or Euclidean distances |
sizes |
vector of sample sizes |
distance |
logical: if TRUE, x is a distance matrix |
ix |
a permutation of the row indices of x |
alpha |
distance exponent in (0,2] |
method |
how to weight the statistics |
Details
A vector containing the pairwise two-sample multivariate
-statistics for comparing clusters or samples is returned.
The e-distance between clusters is computed from the original pooled data,
stacked in matrix
x
where each row is a multivariate observation, or
from the distance matrix x
of the original data, or distance object
returned by dist
. The first sizes[1]
rows of the original data
matrix are the first sample, the next sizes[2]
rows are the second
sample, etc. The permutation vector ix
may be used to obtain
e-distances corresponding to a clustering solution at a given level in
the hierarchy.
The default method cluster
summarizes the e-distances between
clusters in a table.
The e-distance between two clusters
of size
proposed by Szekely and Rizzo (2005)
is the e-distance
, defined by
where
denotes Euclidean norm,
alpha
, and denotes the p-th observation in the i-th cluster. The
exponent
alpha
should be in the interval (0,2].
The coefficient
is one-half of the harmonic mean of the sample sizes. The
discoB
method is related but with
different ways of summarizing the pairwise differences between samples.
The disco
methods apply the coefficient
where N is the total number
of observations. This weights each (i,j) statistic by sample size
relative to N. See the
disco
topic for more details.
Value
A object of class dist
containing the lower triangle of the
e-distance matrix of cluster distances corresponding to the permutation
of indices ix
is returned. The method
attribute of the
distance object is assigned a value of type, index.
Author(s)
Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely
References
Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering
via Joint Between-Within Distances: Extending Ward's Minimum
Variance Method, Journal of Classification 22(2) 151-183.
doi:10.1007/s00357-005-0012-9
M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.
doi:10.1214/09-AOAS245
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
Szekely, G. J. (2000) Technical Report 03-05,
-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics,
Bowling Green State University.
See Also
energy.hclust
eqdist.etest
ksample.e
disco
Examples
## compute cluster e-distances for 3 samples of iris data
data(iris)
edist(iris[,1:4], c(50,50,50))
## pairwise disco statistics
edist(iris[,1:4], c(50,50,50), method="discoB")
## compute e-distances from a distance object
data(iris)
edist(dist(iris[,1:4]), c(50, 50, 50), distance=TRUE, alpha = 1)
## compute e-distances from a distance matrix
data(iris)
d <- as.matrix(dist(iris[,1:4]))
edist(d, c(50, 50, 50), distance=TRUE, alpha = 1)