aggExCluster {apcluster} | R Documentation |
Exemplar-based Agglomerative Clustering
Description
Runs exemplar-based agglomerative clustering
Usage
## S4 method for signature 'matrix,missing'
aggExCluster(s, x, includeSim=FALSE)
## S4 method for signature 'matrix,ExClust'
aggExCluster(s, x, includeSim=FALSE)
## S4 method for signature 'Matrix,missing'
aggExCluster(s, x, includeSim=FALSE)
## S4 method for signature 'Matrix,ExClust'
aggExCluster(s, x, includeSim=FALSE)
## S4 method for signature 'missing,ExClust'
aggExCluster(s, x, includeSim=TRUE)
## S4 method for signature 'function,ANY'
aggExCluster(s, x, includeSim=TRUE, ...)
## S4 method for signature 'character,ANY'
aggExCluster(s, x, includeSim=TRUE, ...)
Arguments
s |
an |
x |
either a prior clustering of class |
includeSim |
if |
... |
all other arguments are passed to the selected similarity function as they are. |
Details
aggExCluster
performs agglomerative clustering.
Unlike other methods, e.g., the ones implemented in hclust
,
aggExCluster
is computing exemplars for each cluster and
its merging objective is geared towards the identification of
meaningful exemplars, too.
For each pair of clusters, the merging objective is computed as follows:
An intermediate cluster is created as the union of the two clusters.
The potential exemplar is selected from the intermediate cluster as the sample that has the largest average similarity to all other samples in the intermediate cluster.
Then the average similarity of the exemplar with all samples in the first cluster and the average similarity with all samples in the second cluster is computed. These two values measure how well the joint exemplar describes the samples in the two clusters.
The merging objective is finally computed as the average of the two measures above. Hence, we can consider the merging objective as some kind of “balanced average similarity to the joint exemplar”.
In each step, all pairs of clusters are considered and the pair with the largest merging objective is actually merged. The joint exemplar is then chosen as the exemplar of the merged cluster.
aggExCluster
can be used in two ways, either by performing
agglomerative clustering of an entire data set or by performing
agglomerative clustering of data previously clustered by
affinity propagation or another clustering algorithm.
Agglomerative clustering of an entire data set can be accomplished either by calling
aggExCluster
on a quadratic similarity matrix without further argument or by callingaggExCluster
for a function or function name along with data to be clustered (as argumentx
). A full agglomeration run is performed that starts froml
clusters (all samples in separate one-element clusters) and ends with one cluster (all samples in one single cluster).Agglomerative clustering starting from a given clustering result can be accomplished by calling
aggExCluster
for anAPResult
orExClust
object passed as parameterx
. The similarity matrix can either be passed as arguments
or, if missing,aggExCluster
looks if the similarity matrix is included in the clustering objectx
. A cluster hierarchy with numbers of clusters ranging from the number of clusters inx
down to 1 is created.
The result is stored in an AggExResult
object.
The slot height
is filled with the merging
objective of each of the maxNoClusters-1
merges. The slot
order
contains a permutation of the samples/clusters for
dendrogram plotting. The algorithm for computing this permutation
is the same as the one used in hclust
. If aggExCluster
was called for an entire data set, the slot label
contains the names of the objects to be clustered (if available,
otherwise the indices are used). If aggExCluster
was called
for a prior clustering, then labels are set to ‘Cluster 1’,
‘Cluster 2’, etc.
Value
Upon successful completion, the function returns an
AggExResult
object.
Note
Similarity matrices can be supplied in dense or sparse format. Note, however, that sparse matrices are converted to full dense matrices before clustering which may lead to memory and/or performance bottlenecks for larger data sets.
Author(s)
Ulrich Bodenhofer, Johannes Palme, and Nikola Kostic
References
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
See Also
AggExResult
, apcluster-methods
,
plot-methods
, heatmap-methods
,
cutree-methods
Examples
## create two Gaussian clouds
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- rbind(cl1, cl2)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(negDistMat(r=2), x)
## show results
show(aggres1)
## plot dendrogram
plot(aggres1)
## plot heatmap along with dendrogram
heatmap(aggres1)
## plot level with two clusters
plot(aggres1, x, k=2)
## run affinity propagation
apres <- apcluster(negDistMat(r=2), x, q=0.7)
## create hierarchy of clusters determined by affinity propagation
aggres2 <- aggExCluster(x=apres)
## show results
show(aggres2)
## plot dendrogram
plot(aggres2)
plot(aggres2, showSamples=TRUE)
## plot heatmap
heatmap(aggres2)
## plot level with two clusters
plot(aggres2, x, k=2)