EV-EVm.spc {zipfR} | R Documentation |
Binomial Interpolation (zipfR)
Description
Compute the expected vocabulary size (with function
EV.spc
) or expected frequency spectrum (with
function
EVm.spc
) for a random sample of size from a
given frequency spectrum (i.e., an object of class
spc
). The
expectations are calculated by binomial interpolation (following
Baayen 2001, pp. 64-69).
Note that these functions are not user-visible. They can be called
implicitly through the generic methods EV
and EVm
,
applied to an object of type spc
.
Usage
## S3 method for class 'spc'
EV(obj, N, allow.extrapolation=FALSE, ...)
## S3 method for class 'spc'
EVm(obj, m, N, allow.extrapolation=FALSE, ...)
Arguments
obj |
an object of class |
m |
positive integer value determining the frequency class
|
N |
sample size |
allow.extrapolation |
if |
... |
additional arguments passed on from generic methods will be ignored |
Details
These functions are naive implementations of binomial interpolation,
using Equations (2.41) and (2.43) from Baayen (2001). No guarantees
are made concerning their numerical accuracy, especially for extreme
values of and
.
According to Baayen (2001), pp. 69-73., the same equations can also be
used for binomial extrapolation of a given frequency spectrum
to larger sample sizes. However, they become numerically unstable in
this case and will typically break down when extrapolating to more
than twice the size of the observed sample (Baayen 2001, p. 75).
Therefore, extrapolation has to be enabled explicitly with the option
allow.extrapolation=TRUE
and should be used with great caution.
Value
EV
returns the expected vocabulary size for a
random sample of
tokens from the frequency spectrum
obj
, and EVm
returns the expected spectrum elements
for a random sample of
tokens from
obj
,
calculated by binomial interpolation.
References
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.
See Also
EV
and EVm
for the generic methods and
links to other implementations
spc.interp
and vgc.interp
are convenience
functions that compute an expected frequency spectrum or vocabulary
growth curve by binomial interpolation