TWP {WordPools} | R Documentation |
The Toronto Word Pool
Description
The Toronto Word Pool consists of 1080 words in various grammatical classes together with a variety of normative variables.
The TWP
contains high frequency nouns, adjectives, and verbs taken
originally from the Thorndike-Lorge (1944) norms.
This word pool has been used in hundreds of studies at Toronto and elsewhere.
Usage
data(TWP)
Format
A data frame with 1093 observations on the following 12 variables.
itmno
item number
word
the word
imagery
imagery rating
concreteness
concreteness rating
letters
number of letters
frequency
word frequency, from the Kucera-Francis norms
foa
a measure of first order approximation to English. In a first-order approximation, the probability of generating any string of letters is based on the frequencies of occurrence of individual letters in the language.
soa
a measure of second order approximation to English, based on bigram frequencies.
onr
Orthographic neighbor ratio, taken from Landauer and Streeter (1973). It is the ratio of the frequency of the word in Kucera and Francis (1967) count divided by the sum of the frequencies of all its orthographic neighbors.
dictcode
dictionary codes, a factor indicating the collection of grammatical classes, 1-5, for a given word form
. In the code, "1" in any position means the item had a dictionary definition as a noun; similarly, a "2" means a verb, "3" means an adjective, "4" means an adverb, and "5" was used to cover all other grammatical categories (but in practice was chiefly a preposition). Thus an entry "2130" indicates an item defined as a verb, noun, and an adjective in that order of historical precedence.
noun
percent noun usage. Words considered unambiguous based on
dictcode
are listed as 0 or 100; other items were rated in a judgment task.canadian
a factor indicating an alternative Canadian spelling of a given word
Details
The last 13 words in the list are alternative Canadian spellings of words
listed earlier, and have duplicate itmno
values.
Source
Friendly, M., Franklin, P., Hoffman, D. & Rubin, D. The Toronto Word Pool, Behavior Research Methods and Instrumentation, 1982, 14(4), 375-399. http://datavis.ca/papers/twp.pdf.
References
Kucera and Francis, W.N. (1967). Computational Analysis of Present-Day American English. Providence: Brown University Press.
Landauer, T. K., & Streeter, L. A. Structural differences between common and rare words: Failure of equivalent assumptions for theories of word recognition. Journal of Verbal Learning and Verbal Behavior, 1973, 11, 119-131.
Examples
data(TWP)
str(TWP)
summary(TWP)
# quick view of distributions
boxplot(scale(TWP[, 3:9]))
plotDensity(TWP, "imagery")
plotDensity(TWP, "concreteness")
plotDensity(TWP, "frequency")
# select low imagery, concreteness and frequency words
R <- list(imagery=c(1,5), concreteness=c(1,4), frequency=c(0,30))
pickList(TWP, R)
# dplyr now makes this much more flexible
if (require(dplyr)) {
# select items within given ranges
selected <- TWP |>
filter( canadian == 0) |> # remove Canadian spellings
filter( imagery <= 5, concreteness <= 4, frequency <= 30) |>
select(word, imagery:frequency )
str(selected)
# get random samples of selected items
nitems <- 5
nlists <- 2
lists <- selected |>
sample_n( nitems*nlists, replace=FALSE) |>
mutate(list = rep(1:nlists, each=nitems))
str(lists)
lists
}