fuzzydocs {sets} | R Documentation |
Documents on Fuzzy Theory
Description
Occurence of three terms (neural networks, fuzzy, and image) in 30 documents retrieved from a Japanese article data base on fuzzy theory and systems.
Usage
data("fuzzy_docs")
Format
fuzzy_docs
is a list of 30 fuzzy multisets, representing the
occurrence of the terms “neural networks”, “fuzzy”, and
“image” in each document. Each term appears with up to
three membership values representing weights,
depending on whether the term occurred
in the abstract (0.2), the keywords section (0.6), and/or the title
(1). The first 12 documents concern neural networks, the remaining 18
image processing. In the reference, various clustering methods have
been employed to recover the two groups in the data set.
Source
K. Mizutani, R. Inokuchi, and S. Miyamoto (2008), Algorithms of Nonlinear Document Clustering Based on Fuzzy Multiset Model, International Journal of Intelligent Systems, 23, 176–198.
Examples
data(fuzzy_docs)
## compute distance matrix using Jaccard dissimilarity
d <- as.dist(set_outer(fuzzy_docs, gset_dissimilarity))
## apply hierarchical clustering (Ward method)
cl <- hclust(d, "ward")
## retrieve two clusters
cutree(cl, 2)
## -> clearly, the clusters are formed by docs 1--12 and 13--30,
## respectively.