R: Lemmalex dictionary

lemmalex {LexFindR}

R Documentation

Lemmalex dictionary

Description

Lemmalex is primarily based on the SUBTLEXus subtitle corpus (based on American subtitles with 51 million items in total) reduced to lemma using a copyrighted database (Francis and Kučera, 1982). The pronunciation is given by CMU Pronouncing Dictionary

Usage

lemmalex

Format

An object of class tbl_df (inherits from tbl, data.frame) with 17750 rows and 3 columns.

Details

Reference: Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior research methods, 41(4), 977-990.

Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Brown university press.

CMU Pronouncing Dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict

@format A table with 20,293 rows and 3 variables:

Item: SUBTLEXus dictionary reduced to lemmas
Frequency: Number of times the item appeared in the SUBTLEXus corpus
Pronunciation: ARPAbet transcription according to CMU

...

Source

https://www.ugent.be/pp/experimentele-psychologie/en/research/documents/subtlexus

[Package LexFindR version 1.1.0 Index]