energy.hclust {energy}  R Documentation 
Performs hierarchical clustering by minimum (energy) Edistance method.
energy.hclust(dst, alpha = 1)
dst 

alpha 
distance exponent 
Dissimilarities are d(x,y) = \xy\^\alpha
,
where the exponent \alpha
is in the interval (0,2].
This function performs agglomerative hierarchical clustering.
Initially, each of the n singletons is a cluster. At each of n1 steps, the
procedure merges the pair of clusters with minimum edistance.
The edistance between two clusters C_i, C_j
of sizes n_i, n_j
is given by
e(C_i, C_j)=\frac{n_i n_j}{n_i+n_j}[2M_{ij}M_{ii}M_{jj}],
where
M_{ij}=\frac{1}{n_i n_j}\sum_{p=1}^{n_i} \sum_{q=1}^{n_j}
\X_{ip}X_{jq}\^\alpha,
\\cdot\
denotes Euclidean norm, and X_{ip}
denotes the pth observation in the ith cluster.
The return value is an object of class hclust
, so hclust
methods such as print or plot methods, plclust
, and cutree
are available. See the documentation for hclust
.
The edistance measures both the heterogeneity between clusters and the
homogeneity within clusters. \mathcal E
clustering
(\alpha=1
) is particularly effective in
high dimension, and is more effective than some standard hierarchical
methods when clusters have equal means (see example below).
For other advantages see the references.
edist
computes the energy distances for the result (or any partition)
and returns the cluster distances in a dist
object. See the edist
examples.
An object of class hclust
which describes the tree produced by
the clustering process. The object is a list with components:
merge: 
an n1 by 2 matrix, where row i of 
height: 
the clustering height: a vector of n1 nondecreasing real numbers (the edistance between merging clusters) 
order: 
a vector giving a permutation of the indices of
original observations suitable for plotting, in the sense that a
cluster plot using this ordering and matrix 
labels: 
labels for each of the objects being clustered. 
call: 
the call which produced the result. 
method: 
the cluster method that has been used (edistance). 
dist.method: 
the distance that has been used to create 
Currently stats::hclust
implements Ward's method by method="ward.D2"
,
which applies the squared distances. That method was previously "ward"
.
Because both hclust
and energy use the same type of LanceWilliams recursive formula to update cluster distances, now with the additional option method="ward.D"
in hclust
, the
energy distance method is easily implemented by hclust
. (Some "Ward" algorithms do not use LanceWilliams, however). Energy clustering (with alpha=1
) and "ward.D" now return the same result, except that the cluster heights of energy hierarchical clustering with alpha=1
are two times the heights from hclust
. In order to ensure compatibility with hclust methods, energy.hclust
now passes arguments through to hclust
after possibly applying the optional exponent to distance.
Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely
Szekely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering
via Joint BetweenWithin Distances: Extending Ward's Minimum
Variance Method, Journal of Classification 22(2) 151183.
doi: 10.1007/s0035700500129
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, November (5).
Szekely, G. J. (2000) Technical Report 0305:
\mathcal{E}
statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics, Bowling
Green State University.
edist
ksample.e
eqdist.etest
hclust
## Not run:
library(cluster)
data(animals)
plot(energy.hclust(dist(animals)))
data(USArrests)
ecl < energy.hclust(dist(USArrests))
print(ecl)
plot(ecl)
cutree(ecl, k=3)
cutree(ecl, h=150)
## compare performance of eclustering, Ward's method, group average method
## when sampled populations have equal means: n=200, d=5, two groups
z < rbind(matrix(rnorm(1000), nrow=200), matrix(rnorm(1000, 0, 5), nrow=200))
g < c(rep(1, 200), rep(2, 200))
d < dist(z)
e < energy.hclust(d)
a < hclust(d, method="average")
w < hclust(d^2, method="ward.D2")
list("E" = table(cutree(e, k=2) == g), "Ward" = table(cutree(w, k=2) == g),
"Avg" = table(cutree(a, k=2) == g))
## End(Not run)