vgc {zipfR} | R Documentation |
Vocabulary Growth Curves (zipfR)
Description
In the zipfR
library, vgc
objects are used to represent
a vocabulary growth curve (VGC). This can be an observed VGC from an
incremental set of sample (such as a corpus), a randomized VGC
obtained by binomial interpolation, or the expected VGC according to a
LNRE model.
With the vgc
constructor function, an object can be initialized
directly from the specified data vectors. It is more common to read
an observed VGC from a disk file with read.vgc
, generate
a randomized VGC with vgc.interp
or compute an expected
VGC with lnre.vgc
, though.
vgc
objects should always be treated as read-only.
Usage
vgc(N, V, Vm=NULL, VV=NULL, VVm=NULL, expected=FALSE, check=TRUE)
Arguments
N |
integer vector of sample sizes |
V |
vector of corresponding vocabulary sizes |
Vm |
optional list of growth vectors for hapaxes |
VV |
optional vector of variances
|
VVm |
optional list of variance vectors
|
expected |
if |
check |
by default, various sanity checks are performed on the
data supplied to the |
Details
If variances (VV
or VVm
) are specified for an expected
VGC, all relevant vectors must be given. In other words, VV
always has to be present in this case, and VVm
has to be
present whenever Vm
is specified, and must contain vectors for
exactly the same frequency classes.
V
and VVm
are integer vectors for an observed VGC, but
will usually be fractional for an interpolated or expected VGC.
A vgc
object is a data frame with the following variables:
N
sample size
N
V
corresponding vocabulary size (either observed vocabulary size
V(N)
or expected vocabulary sizeE[V(N)]
)V1
...V9
optional: observed or expected spectrum elements (
V_m(N)
orE[V_m(N)]
). Not all of these variables have to be present, but there must not be any "gaps" in the spectrum.VV
optional: variance of expected vocabulary size,
\mathop{Var}[V(N)]
VV1
...VV9
optional: variances of expected spectrum elements,
\mathop{Var}[V_m(N)]
. If variances are present, they must be available for exactly the same frequency classes as the corresponding expected values.
The following attributes are used to store additional information about the vocabulary growth curve:
m.max
if non-zero, the VGC includes spectrum elements
V_m(N)
form
up tom.max
. Form.max=0
, no spectrum elements are present.expected
if
TRUE
, the object represents an interpolated or expected VGC, with expected vocabulary size and spectrum elements. Otherwise, the object represents an observed VGC.hasVariances
indicates whether or not the
VV
variable is present (as well asVV1
,VV2
, etc., if appropriate)
Value
An object of class vgc
representing the specified vocabulary
growth curve. This object should be treated as read-only (although
such behaviour cannot be enforced in R).
See Also
read.vgc
, write.vgc
, plot.vgc
,
vgc.interp
, lnre.vgc
Generic methods supported by vgc
objects are
print
, summary
, N
,
V
, Vm
, VV
, and
VVm
.
Implementation details and non-standard arguments for these methods
can be found on the manpages print.vgc
,
summary.vgc
, N.vgc
, V.vgc
,
etc.
Examples
## load Dickens' work empirical vgc and take a look at it
data(Dickens.emp.vgc)
summary(Dickens.emp.vgc)
print(Dickens.emp.vgc)
plot(Dickens.emp.vgc,add.m=1)
## vectors of sample sizes in the vgc, and the
## corresponding V and V_1 vectors
Ns <- N(Dickens.emp.vgc)
Vs <- V(Dickens.emp.vgc)
Vm <- V(Dickens.emp.vgc,1)
## binomially interpolated V and V_1 at the same sample sizes
## as the empirical curve
data(Dickens.spc)
Dickens.bin.vgc <- vgc.interp(Dickens.spc,N(Dickens.emp.vgc),m.max=1)
## compare observed and interpolated
plot(Dickens.emp.vgc,Dickens.bin.vgc,add.m=1,legend=c("observed","interpolated"))
## load Italian ultra- prefix data
data(ItaUltra.spc)
## compute zm model
zm <- lnre("zm",ItaUltra.spc)
## compute vgc up to about twice the sample size
## with variance of V
zm.vgc <- lnre.vgc(zm,(1:100)*70, variances=TRUE)
summary(zm.vgc)
print(zm.vgc)
## plot with confidence intervals derived from variance in
## vgc (with larger datasets, ci will typically be almost
## invisible)
plot(zm.vgc)
## for more examples of vgc usages, see manpages of lnre.vgc,
## plot.vgc, print.vgc and vgc.interp