SemCorWSD {wordspace} | R Documentation |
SemCor Word Sense Disambiguation Task (wordspace)
Description
A collection of sentences containing ambiguous words manually labelled with WordNet senses. The data were obtained from the SemCor corpus version 3.0.
Usage
SemCorWSD
Format
A data frame with 647 rows and the following 8 columns (all of type character):
id
Unique item ID
target
The target word (lemmatized)
pos
Word class of the target word (
n
,v
ora
)sense
Sense of the target word in this sentence (given as a WordNet lemma)
gloss
WordNet definition of this sense
sentence
The sentence containing the ambiguous word
hw
Lemmatized form of the sentence (“headwords”); punctuation marks are excluded and all remaining words are case-folded
lemma
Lemmatized and POS-disambiguated form in CWB/Penn format, e.g.
move_N
for the headword move used as a noun
Details
Target words and senses had to meet the following criteria in order to be included in the data set:
sense occurs
f \ge 5
times in SemCor 3.0sense accounts for at least 10% of all occurrences of the target
at least two senses of target remain after previous two filters
SemCorWSD
contains sentence contexts for the following target words:
ambiguous nouns from Schütze (1998): interest, plant, space, vessel
misc. ambiguous nouns: bank
misc. ambiguous verbs: find, grasp, open, run
Source
TODO (SemCor reference, NLTK extraction)
References
Schütze, Hinrich (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.
See Also
Examples
with(SemCorWSD, table(sense, target))
# all word senses with brief definitions ("glosses")
with(SemCorWSD, sort(unique(paste0(target, " ", sense, ": ", gloss))))