lemmalex {LexFindR} | R Documentation |
Lemmalex dictionary
Description
Lemmalex is primarily based on the SUBTLEXus subtitle corpus (based on American subtitles with 51 million items in total) reduced to lemma using a copyrighted database (Francis and Kučera, 1982). The pronunciation is given by CMU Pronouncing Dictionary
Usage
lemmalex
Format
An object of class tbl_df
(inherits from tbl
, data.frame
) with 17750 rows and 3 columns.
Details
Reference: Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior research methods, 41(4), 977-990.
Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Brown university press.
CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
@format A table with 20,293 rows and 3 variables:
- Item
SUBTLEXus dictionary reduced to lemmas
- Frequency
Number of times the item appeared in the SUBTLEXus corpus
- Pronunciation
ARPAbet transcription according to CMU
...
Source
https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus