| WordSim353 {wordspace} | R Documentation | 
Similarity Ratings for 351 Noun Pairs (wordspace)
Description
A database of human similarity ratings for 351 English noun pairs, collected by Finkelstein et al. (2002) and annotated with semantic relations (similarity vs. relatedness) by Agirre et al. (2009).
Usage
WordSim353
Format
A data frame with 351 rows and the following 6 columns:
- word1
- first noun (character) 
- word2
- second noun (character) 
- score
- average similarity rating by human judges on scale from 0 to 10 (numeric) 
- relation
- semantic relation between first and second word (factor, see Details below) 
- similarity
- whether word pair belongs to the similarity subset (logical) 
- relatedness
- whether word pair belongs to the relatedness subset (logical) 
The nouns are given as disambiguated lemmas in the form <headword>_N.
Details
The data set is known as WordSim353 because it originally consisted of 353 noun pairs.
One duplicate entry (money–cash) as well as the trivial combination 
tiger–tiger (which may have been included as a control item)
have been omitted in the present version, however.
The following semantic relations are distinguished in the relation variable:
synonym, antonym, hypernym, hyponym, co-hyponym,
holonym, meronym and other (topically related or completely unrelated).
Note that the similarity and relatedness subsets are not disjoint, because they
share 103 unrelated noun pairs (semantic relation other and score below 5.0).
Source
Similarity ratings (Finkelstein et al. 2002): https://gabrilovich.com/resources/data/wordsim353/wordsim353.html
Semantic relations (Agirre et al. 2009): http://alfonseca.org/eng/research/wordsim353.html
References
Agirre, Eneko, Alfonseca, Enrique, Hall, Keith, Kravalova, Jana, Pasca, Marius, and Soroa, Aitor (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2009), pages 19–27, Boulder, Colorado.
Finkelstein, Lev, Gabrilovich, Evgeniy, Matias, Yossi, Rivlin, Ehud, Solan, Zach, Wolfman, Gadi, and Ruppin, Eytan (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.
Examples
head(WordSim353, 20)
table(WordSim353$relation) # semantic relations
# split into "similarity" and "relatedness" subsets
xtabs(~ similarity + relatedness, data=WordSim353)