R: Plot Clustering Results

plot {apcluster}

R Documentation

Plot Clustering Results

Description

Functions for Visualizing Clustering Results

Usage

## S4 method for signature 'APResult,missing'
plot(x, y, type=c("netsim", "dpsim", "expref"),
    xlab="# Iterations", ylab="Similarity", ...)
## S4 method for signature 'ExClust,matrix'
plot(x, y, connect=TRUE, xlab="", ylab="",
labels=NA, limitNo=15, ...)
## S4 method for signature 'ExClust,data.frame'
plot(x, y, connect=TRUE, xlab="",
ylab="", labels=NA, limitNo=15, ...)
## S4 method for signature 'AggExResult,missing'
plot(x, y, main="Cluster dendrogram",
    xlab="", ylab="", ticks=4, digits=2, base=0.05, showSamples=FALSE,
    horiz=FALSE, ...)
## S4 method for signature 'AggExResult,matrix'
plot(x, y, k=NA, h=NA, ...)
## S4 method for signature 'AggExResult,data.frame'
plot(x, y, k=NA, h=NA, ...)

Arguments

`x`	a clustering result object of class `APResult`, `ExClust`, or `AggExResult`
`y`	a matrix or data frame (see details below)
`type`	a string or array of strings indicating which performance measures should be plotted; valid values are `"netsim"`, `"dpsim"`, and `"expref"` which can be used in any combination or order; all other strings are ignored (for the meaning see `APResult`)
`xlab`, `ylab`	labels for axes of 2D plots; ignored if `y` has more than two columns
`labels`	names used for variables in scatter plot matrix (displayed if `y` has more than two columns). If `NA` (default), column names are used. If no column names are available, labels such as `x[, 2]` are displayed.
`limitNo`	if the number of columns/features in `y` is too large, problems may occur when attempting to plot a scatter plot matrix. To avoid problems, the `plot` method throws an error if the number of columns exceeds `limitNo`. For special applications, users can increase the value (15 by default). If `limitNo` is set to `NA` or any other non-numeric value, the limit is ignored entirely. Please note that attempting to plot scatter plot matrices with too many features may corrupt the graphics device. So users are making changes at their own risk. If plotting of many features is necessary, make sure that the graphics device is large enough to accommodate the plot (e.g. by using a sufficiently large graphics file device).
`connect`	used only if clustering is plotted on original data, ignored otherwise. If `connect` is `TRUE`, lines are drawn connecting exemplars with their cluster members.
`main`	title of plot
`ticks`	number of ticks used for the axis on the left side of the plot (applies to dendrogram plots only, see below)
`digits`	number of digits used for the axis tickmarks on the left side of the plot (applies to dendrogram plots only, see below)
`base`	fraction of height used for the very first join; defaults to 0.05, i.e. the first join appears at 5% of the total height of the dendrogram.
`showSamples`	if `TRUE`, a complete cluster hierarchy is shown, otherwise, in case that `x` is a hierarchy of clusters, the dendrogram of clusters is shown. For backward compatibility, the default is `FALSE`.
`horiz`	if `TRUE`, the dendrogram is plotted horizontally (analogous to `plot.dendrogram`). The default is `FALSE`.
`k`	level to be selected when plotting a single clustering level of cluster hierarchy (i.e. the number of clusters; see `cutree-methods`)
`h`	cut-off to be used when plotting a single clustering level of cluster hierarchy (see `cutree-methods`)
`...`	all other arguments are passed to the plotting command that are used internally, `plot` or `heatmap`.

Details

If plot is called for an APResult object without specifying the second argument y, a plot is created that displays graphs of performance measures over execution time of the affinity propagation run. This only works if apcluster was called with details=TRUE.

If plot is called for an APResult object along with a matrix or data frame as argument y, then the dimensions of the matrix determine the behavior of plot:

If the matrix y has two columns, y is interpreted as the original data set. Then a plot of the clustering result superimposed on the original data set is created. Each cluster is displayed in a different color. The exemplar of each cluster is highlighted by a black square. If connect is TRUE, lines connecting the cluster members to their exemplars are drawn. This variant of plot does not return any value.
If y has more than two columns, clustering results are superimposed in a sort of scatter plot matrix. The variant that y is interpreted as similarity matrix if it is quadratic has been removed in version 1.3.2. Use heatmap instead.
If y has only one column, an error is displayed.

If plot is called for an ExClust object along with a matrix or data frame as argument y, then plot behaves exactly the same as described in the previous paragraph.

If plot is called for an AggExResult object without specifying the second argument y, then a dendrogram plot is drawn. This variant returns an invisible dendrogram object. The showSamples argument determines whether a complete dendrogram or a dendrogram of clusters is plotted (see above). If the option horiz=TRUE is used, the dendrogram is rotated. Note that, in this case, the margin to the right of the plot may not be wide enough to accommodate long cluster/sample labels. In such a case, the figure margins have to be widened before plot is called.

If plot is called for an AggExResult object along with a matrix or data frame y, y is again interpreted as original data set. If one of the two arguments k or h is present, a clustering is cut out from the cluster hierarchy using cutree and this clustering is displayed with the original data set as described above. This variant of plot returns an invisible ExClust object containing the extracted clustering.

Value

see details above

Author(s)

Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme

References

https://github.com/UBod/apcluster

Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.

Examples

## create two Gaussian clouds
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- rbind(cl1, cl2)

## run affinity propagation
apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE)

## plot information about clustering run
plot(apres)

## plot clustering result
plot(apres, x)

## perform agglomerative clustering of affinity propagation clusters
aggres1 <- aggExCluster(x=apres)

## show dendrograms
plot(aggres1)
plot(aggres1, showSamples=TRUE)

## show clustering result for 4 clusters
plot(aggres1, x, k=4)

## perform agglomerative clustering of whole data set
aggres2 <- aggExCluster(negDistMat(r=2), x)

## show dendrogram
plot(aggres2)

## show heatmap along with dendrogram
heatmap(aggres2)

## show clustering result for 2 clusters
plot(aggres2, x, k=2)

## cluster iris data set
data(iris)
apIris <- apcluster(negDistMat(r=2), iris, q=0)
plot(apIris, iris)

[Package apcluster version 1.4.13 Index]