word_distrib {opitools} | R Documentation |
Words Distribution
Description
This function examines whether the distribution of word frequencies in a text document follows the Zipf distribution (Zipf 1934). The Zipf's distribution is considered the ideal distribution of a perfect natural language text.
Usage
word_distrib(textdoc)
Arguments
textdoc |
|
Details
The Zipf's distribution is most easily observed by plotting the data on a log-log graph, with the axes being log(word rank order) and log(word frequency). For a perfect natural language text, the relationship between the word rank and the word frequency should have a negative slope with all points falling on a straight line. Any deviation from the straight line can be considered an imperfection attributable to the texts within the document.
Value
A list of word ranks and their respective frequencies, and a plot showing the relationship between the two variables.
References
Zipf G (1936). The Psychobiology of Language. London: Routledge; 1936.
Examples
#Get an \code{n} x 1 text document
tweets_dat <- data.frame(text=tweets[,1])
plt = word_distrib(textdoc = tweets_dat)
plt