growth.fnc {languageR} | R Documentation |
Calculate vocabulary growth curve and vocabulary richness measures
Description
This function calculates, for an increasing sequence of text sizes, the observed number of types, hapax legomena, dis legomena, tris legomena, and selected measures of lexical richness.
Usage
growth.fnc(text = languageR::alice, size = 646, nchunks = 40, chunks = 0)
Arguments
text |
A vector of strings representing a text. |
size |
An integer giving the size of a text chunk when the text is to be split into a series of equally-sized text chunks. |
nchunks |
An integer denoting the number of desired equally-sized text chunks. |
chunks |
An integer vector denoting the token sizes for which growth
measures are required. When chunks is specified, |
Value
A growth object with methods for plotting, printing. As running this function on large texts may take some time, a period is printed on the output device for each completed chunk to indicate progress.
The data frame with the actual measures, which can be extracted with
object.name@data$data
, has the following columns.
Chunk |
a numeric vector with chunk numbers. |
Tokens |
a numeric vector with the number of tokens up to and including the current chunk. |
Types |
a numeric vector with the number of types up to and including the current chunk. |
HapaxLegomena |
a numeric vector with the corresponding count of hapax legomena. |
DisLegomena |
a numeric vector with the corresponding count of dis legomena. |
TrisLegomena |
a numeric vector with the corresponding count of tris legomena. |
Yule |
a numeric vector with Yule's |
Zipf |
a numeric vector with the slope of Zipf's rank-frequency curve in the double-logarithmic plane. |
TypeTokenRatio |
a numeric vector with the ratio of types to tokens. |
Herdan |
a numeric vector with Herdan's |
Guiraud |
a numeric vector with Guiraud's |
Sichel |
a numeric vector with Sichel's |
Lognormal |
a numeric vector with mean log frequency. |
Author(s)
R. H. Baayen
References
R. H. Baayen (2001) Word Frequency Distributions, Dordrecht: Kluwer Academic Publishers.
Tweedie, F. J. & Baayen, R. H. (1998) How variable may a constant be? Measures of lexical richness in perspective, Computers and the Humanities, 32, 323-352.
See Also
See Also plot.growth
, and the zipfR package.
Examples
## Not run:
data(alice)
alice.growth = growth.fnc(alice)
plot(alice.growth)
## End(Not run)