similarity {sets} | R Documentation |
Similarity and Dissimilarity Functions
Description
Similarities and dissimilarities for (generalized) sets.
Usage
set_similarity(x, y, method = "Jaccard")
gset_similarity(x, y, method = "Jaccard")
cset_similarity(x, y, method = "Jaccard")
set_dissimilarity(x, y,
method = c("Jaccard", "Manhattan", "Euclidean",
"L1", "L2"))
gset_dissimilarity(x, y,
method = c("Jaccard", "Manhattan", "Euclidean",
"L1", "L2"))
cset_dissimilarity(x, y,
method = c("Jaccard", "Manhattan", "Euclidean",
"L1", "L2"))
Arguments
x , y |
Two (generalized/customizable) sets. |
method |
Character string specifying the proximity method (see below). |
Details
For two generalized sets and
, the
Jaccard
similarity is where
denotes the cardinality for
generalized sets (sum of memberships). The
Jaccard
dissimilarity is 1 minus the similarity.
The L1
(or Manhattan
) and L2
(or
Euclidean
)
dissimilarities are defined as
follows. For two fuzzy multisets and
on a
given universe
with elements
, let
and
be functions returning the memberships of an
element
in sets
and
, respectively. The
memberships are returned in standard form,
i.e. as an infinite vector of decreasing membership
values, e.g.
.
Let
and
denote the
th components of these
membership vectors. Then the L1 distance is defined as:
and the L2 distance as:
Value
A numeric value (similarity or dissimilarity, as specified).
Source
T. Matthe, R. De Caluwe, G. de Tre, A. Hallez, J. Verstraete, M. Leman, O. Cornelis, D. Moelants, and J. Gansemans (2006), Similarity Between Multi-valued Thesaurus Attributes: Theory and Application in Multimedia Systems, Flexible Query Answering Systems, Lecture Notes in Computer Science, Springer, 331–342.
K. Mizutani, R. Inokuchi, and S. Miyamoto (2008), Algorithms of Nonlinear Document Clustering Based on Fuzzy Multiset Model, International Journal of Intelligent Systems, 23, 176–198.
See Also
set
.
Examples
A <- set("a", "b", "c")
B <- set("c", "d", "e")
set_similarity(A, B)
set_dissimilarity(A, B)
A <- gset(c("a", "b", "c"), c(0.3, 0.7, 0.9))
B <- gset(c("c", "d", "e"), c(0.2, 0.4, 0.5))
gset_similarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "L1")
gset_dissimilarity(A, B, "L2")
A <- gset(c("a", "b", "c"), list(c(0.3, 0.7), 0.1, 0.9))
B <- gset(c("c", "d", "e"), list(0.2, c(0.4, 0.5), 0.8))
gset_similarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "Jaccard")
gset_dissimilarity(A, B, "L1")
gset_dissimilarity(A, B, "L2")