R: Affinity Propagation for Pre-defined Number of Clusters

apclusterK {apcluster}

R Documentation

Affinity Propagation for Pre-defined Number of Clusters

Description

Runs affinity propagation clustering for a given similarity matrix adjusting input preferences iteratively in order to achieve a desired number of clusters

Usage

## S4 method for signature 'matrix,missing'
apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE,
     maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE,
     nonoise=FALSE, seed=NA, verbose=TRUE)
## S4 method for signature 'Matrix,missing'
apclusterK(s, x, K, ...)
## S4 method for signature 'dgTMatrix,missing'
apclusterK(s, x, K, prc=10, bimaxit=20,
     exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE,
     details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE)
## S4 method for signature 'sparseMatrix,missing'
apclusterK(s, x, K, ...)
## S4 method for signature 'function,ANY'
apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE,
     maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE,
     nonoise=FALSE, seed=NA, verbose=TRUE, ...)
## S4 method for signature 'character,ANY'
apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE,
     maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE,
     nonoise=FALSE, seed=NA, verbose=TRUE, ...)

Arguments

`s`	an `l\times l` similarity matrix in sparse or dense format or a similarity function either specified as the name of a package-provided similarity function as character string or a user provided function object.
`x`	input data to be clustered; if `x` is a matrix or data frame, rows are interpreted as samples and columns are interpreted as features; apart from matrices or data frames, `x` may be any other structured data type that contains multiple data items - provided that an appropriate `length` function is available that returns the number of items
`K`	desired number of clusters
`prc`	the algorithm stops if the number of clusters does not deviate more than prc percent from desired value K; set to 0 if you want to have exactly K clusters
`bimaxit`	maximum number of bisection steps to perform; note that no warning is issued if the number of clusters is still not in the desired range
`exact`	flag indicating whether or not to compute the initial preference range exactly (see `preferenceRange`)
`maxits`	maximal number of iterations that `apcluster` should execute
`convits`	`apcluster` terminates if the examplars have not changed for `convits` iterations
`lam`	damping factor for `apcluster`; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur
`includeSim`	if `TRUE`, the similarity matrix (either computed internally or passed via the `s` argument) is stored to the slot `sim` of the returned `APResult` object. The default is `FALSE` if `apclusterK` has been called for a similarity matrix, otherwise the default is `TRUE`.
`details`	if `TRUE`, more detailed information about the algorithm's progress is stored in the output object (see `APResult`)
`nonoise`	`apcluster` adds a small amount of noise to `s` to prevent degenerate cases; if `TRUE`, this is disabled
`seed`	for reproducibility, the seed of the random number generator can be set to a fixed value, if `NA`, the seed remains unchanged
`verbose`	flag indicating whether status information should be displayed during bisection
`...`	for the methods with signatures `character,ANY` and `function,ANY`, all other arguments are passed to the selected similarity function as they are; for the methods with signatures `Matrix,missing` and `sparseMatrix,missing`, further arguments are passed on to the `apclusterK` methods with signatures `Matrix,missing` and `dgTMatrix,missing`, respectively.

Details

apclusterK first runs preferenceRange to determine the range of meaningful choices of the input preference p. Then it decreases p exponentially for a few iterations to obtain a good initial guess for p. If the number of clusters is still too far from the desired goal, bisection is applied.

When called with a similarity matrix as input, the function performs the procedure described above. When called with the name of a package-provided similarity function or a user-provided similarity function object and input data, the function first computes the similarity matrix before running apclusterK on this similarity matrix. The similarity matrix is returned for later use as part of the APResult object depending on whether includeSim was set to TRUE (see argument description above).

Apart from minor adaptations and optimizations, the implementation is largely analogous to Frey's and Dueck's Matlab code (see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).

Value

Upon successful completion, the function returns a APResult object.

Author(s)

Ulrich Bodenhofer and Andreas Kothmeier

References

https://github.com/UBod/apcluster

Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.

Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.

Examples

## create three Gaussian clouds
cl1 <- cbind(rnorm(70, 0.2, 0.05), rnorm(70, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
cl3 <- cbind(rnorm(60, 0.8, 0.04), rnorm(60, 0.8, 0.05))
x <- rbind(cl1, cl2, cl3)

## run affinity propagation such that 3 clusters are obtained
apres <- apclusterK(negDistMat(r=2), x, K=3)

## show details of clustering results
show(apres)

## plot clustering result
plot(apres, x)

## create sparse similarity matrix
cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06))
cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05))
x <- rbind(cl1, cl2)

sim <- negDistMat(x, r=2)
ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2)

## run apcluster() on the sparse similarity matrix
apres <- apclusterK(ssim, K=2)
apres

[Package apcluster version 1.4.13 Index]