similarities {apcluster}  R Documentation 
Methods for Computing Similarity Matrices
Description
Compute similarity matrices from data set
Usage
negDistMat(x, sel=NA, r=1, method="euclidean", p=2)
expSimMat(x, sel=NA, r=2, w=1, method="euclidean", p=2)
linSimMat(x, sel=NA, w=1, method="euclidean", p=2)
corSimMat(x, sel=NA, r=1, signed=TRUE, method="pearson")
linKernel(x, sel=NA, normalize=FALSE)
Arguments
x 
input data to be clustered; if 
sel 
selected samples subset; vector of row indices for x in increasing order (see details below) 
r 
exponent (see details below) 
w 
radius (see details below) 
signed 
take sign of correlation into account (see details below) 
normalize 
see details below 
method 
type of distance measure to be used; for 
p 
exponent for Minkowski distance; only used for

Details
negDistMat
creates a square matrix of mutual
pairwise similarities of data vectors as negative distances. The
argument r
(default is 1) is used to transform the resulting
distances by computing the rth power (use r=2
to obtain
negative squared distances as in Frey's and Dueck's demos), i.e.,
given a distance d, the resulting similarity is computed as
s=d^r
. With the parameter sel
a subset of samples
can be specified for distance calculation. In this case not the
full distance matrix is computed but a rectangular similarity matrix
of all samples (rows) against the subset (cols) as needed for
leveraged clustering. Internally, the computation of distances is
done using an internal method derived from
dist
. All options of this function except
diag
and upper
can be used, especially method
which allows for selecting different distance measures.
Note that, since version 1.4.4. of the package, there is an additional
method "discrepancy"
that implements Weyl's discrepancy measure.
expSimMat
computes similarities in a way similar to
negDistMat
, but the transformation of distances to similarities
is done in the following way:
s=\exp\left(\left(\frac{d}{w}\right)^r\right)
The parameter sel
allows the creation of a rectangular
similarity matrix. As above, r is an exponent. The parameter w controls
the speed of descent. r=2
in conjunction with Euclidean
distances corresponds to the wellknown Gaussian/RBF kernel,
whereas r=1
corresponds to the Laplace kernel. Note that these
similarity measures can also be understood as fuzzy equality relations.
linSimMat
provides another way of transforming distances
into similarities by applying the following transformation to a
distance d:
s=\max\left(0,1\frac{d}{w}\right)
Thw parameter sel
is used again for creation of a rectangular
similarity matrix. Here w
corresponds to a maximal radius of
interest. Note that this is a fuzzy equality relation with respect to
the Lukasiewicz tnorm.
Unlike the above three functions, linKernel
computes pairwise
similarities as scalar products of data vectors, i.e. it corresponds,
as the name suggests, to the “linear kernel”. Use parameter
sel
to compute only a submatrix of the full kernel matrix as
described above. If normalize=TRUE
, the values are scaled to
the unit sphere in the following way (for two samples x
and
y
:
s=\frac{\vec{x}^T\vec{y}}{\\vec{x}\ \\vec{y}\}
The function corSimMat
computes pairwise similarities as
correlations. It uses link[stats:cor]{cor}
internally.
The method
argument is passed on to link[stats:cor]{cor}
.
The argument r
serves as an exponent with which the correlations
can be transformed. If signed=TRUE
(default), negative correlations are
taken into account, i.e. two samples are maximally dissimilar if they
are negatively correlated. If signed=FALSE
, similarities are
computed as absolute values of correlations, i.e. two samples are
maximally similar if they are positively or negatively correlated and
the two samples are maximally dissimilar if they are uncorrelated.
Note that the naming of the argument p
has been chosen for
consistency with dist
and previous versions
of the package. When using leveraged AP in
conjunction with the Minkowski distance, this leads to conflicts with
the input preference parameter p
of
apclusterL
. In order to avoid that, use the above
functions without x
argument to create a custom similarity
measure with fixed parameter p
(see example below).
Value
All functions listed above return square or rectangular matrices of similarities.
Author(s)
Ulrich Bodenhofer, Andreas Kothmeier & Johannes Palme apcluster@bioinf.jku.at
References
http://www.bioinf.jku.at/software/apcluster/
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 24632464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972976. DOI: doi:10.1126/science.1136800.
Micchelli, C. A. (1986) Interpolation of scattered data: distance matrices and conditionally positive definite functions. Constr. Approx. 2, 1120.
De Baets, B. and Mesiar, R. (1997) Pseudometrics and Tequivalences. J. Fuzzy Math. 5, 471481.
Bauer, P., Bodenhofer, U., and Klement, E. P. (1996) A fuzzy algorithm for pixel classification based on the discrepancy norm. In Proc. 5th IEEE Int. Conf. on Fuzzy Systems, volume III, pages 2007–2012, New Orleans, LA. DOI: doi:10.1109/FUZZY.1996.552744.
See Also
Examples
## create two Gaussian clouds
cl1 < cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06))
cl2 < cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05))
x < rbind(cl1, cl2)
## create negative distance matrix (default Euclidean)
sim1 < negDistMat(x)
## compute similarities as squared negative distances
## (in accordance with Frey's and Dueck's demos)
sim2 < negDistMat(x, r=2)
## compute RBF kernel
sim3 < expSimMat(x, r=2)
## compute similarities as squared negative distances
## all samples versus a randomly chosen subset
## of 50 samples (for leveraged AP clustering)
sel < sort(sample(1:nrow(x), nrow(x)*0.25))
sim4 < negDistMat(x, sel, r=2)
## example of leveraged AP using Minkowski distance with nondefault
## parameter p
cl1 < cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06))
cl2 < cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05))
x < rbind(cl1, cl2)
apres < apclusterL(s=negDistMat(method="minkowski", p=2.5, r=2),
x, frac=0.2, sweeps=3, p=0.2)
show(apres)