protoclust {protoclust}R Documentation

Hierarchical Clustering with Prototypes: Minimax Linkage.

Description

Performs minimax linkage hierarchical clustering given a set of dissimilarities. Returns an object that looks just like the output of hclust except that it has an additional element containing prototype indices.

Usage

protoclust(d, verb = FALSE)

Arguments

d

dissimilarities object. Can be of class dist or matrix

verb

see verbose output?

Details

This function provides an efficient implementation of minimax linkage hierarchical clustering. Consider two clusters G and H and their union U. The minimax linkage between G and H is defined to be the radius of the smallest ball that encloses all of U and that is centered at one of the points in U. If G and H are merged together, the prototype for the newly formed cluster U is that enclosing ball's center. By construction, the prototype for a cluster will always be one of the objects being clustered. For more on minimax linkage and how one can use prototypes to help interpret a dendrogram, see

Value

An object of class protoclust, which is just like hclust but has an additional element:

merge, height, order

identical to the values returned by hclust

protos

a vector of length n - 1. The i-th element is the index of the prototype corresponding to the cluster formed on the i-th merge.

Author(s)

Jacob Bien and Rob Tibshirani

References

Bien, J., and Tibshirani, R. (2011), "Hierarchical Clustering with Prototypes via Minimax Linkage," The Journal of the American Statistical Association, 106(495), 1075-1084.

This function has been designed to work like hclust in terms of inputs and outputs; however, unlike hclust, it outputs an additional element, namely a vector of length n - 1 containing the indices of prototypes. It follows hclust's convention for making the arbitrary choice of whether to put a subtree on the left or right side.

For cutting a minimax linkage hierarchical clustering, use protocut, which works like cutree except that it returns the set of prototypes in addition to the cluster assignments.

This function calls a C implementation of the algorithm detailed in Bien and Tibshirani (2011) that is based on an algorithm described in Murtagh (1983).

Bien, J., and Tibshirani, R. (2011), "Hierarchical Clustering with Prototypes via Minimax Linkage," The Journal of the American Statistical Association, 106(495), 1075-1084.

Murtagh, F. (1983), "A Survey of Recent Advances in Hierarchical Clustering Algorithms," The Computer Journal, 26, 354–359.

See Also

protocut, plotwithprototypes, hclust

Examples


# generate some data:
set.seed(1)
n <- 100
p <- 2
x <- matrix(rnorm(n * p), n, p)
rownames(x) <- paste("A", 1:n, sep="")
d <- dist(x)

# perform minimax linkage clustering:
hc <- protoclust(d)

# cut the tree to yield a 10-cluster clustering:
k <- 10 # number of clusters
cut <- protocut(hc, k=k)
h <- hc$height[n - k]

# plot dendrogram (and show cut):
plotwithprototypes(hc, imerge=cut$imerge, col=2)
abline(h=h, lty=2)

# get the prototype assigned to each point:
pr <- cut$protos[cut$cl]

# find point farthest from its prototype:
dmat <- as.matrix(d)
ifar <- which.max(dmat[cbind(1:n, pr[1:n])])

# note that this distance is exactly h:
stopifnot(dmat[ifar, pr[ifar]] == h)

# since this is a 2d example, make 2d display:
plot(x, type="n")
points(x, pch=20, col="lightblue")
lines(rbind(x[ifar, ], x[pr[ifar], ]), col=3)
points(x[cut$protos, ], pch=20, col="red")
text(x[cut$protos, ], labels=hc$labels[cut$protos], pch=19)
tt <- seq(0, 2 * pi, length=100)
for (i in cut$protos) {
  lines(x[i, 1] + h * cos(tt), x[i, 2] + h * sin(tt))
}


[Package protoclust version 1.6.4 Index]