tfl {zipfR}R Documentation

Type Frequency Lists (zipfR)


In the zipfR library, tfl objects are used to represent a type frequency list, which specifies the observed frequency of each type in a corpus. For mathematical reasons, expected type frequencies are rarely considered.

With the tfl constructor function, an object can be initialized directly from the specified data vectors. It is more common to read a type frequency list from a disk file with read.tfl or, in some cases, derive it from an observed frequency spectrum with spc2tfl.

tfl objects should always be treated as read-only.


  tfl(f, k=seq_along(f), type=NULL, f.min=min(f), f.max=max(f),
      incomplete=!(missing(f.min) && missing(f.max)), N=NA, V=NA,



integer vector of type IDs kk (if omitted, natural numbers 1,2,1,2,\ldots are assigned automatically)


vector of corresponding type frequencies fkf_k


optional character vector of type representations (e.g. word forms or lemmata), used for informational and printing purposes only


indicates that the type frequency list is incomplete, i.e. only contains types in a certain frequency range (typically, the lowest-frequency types may be excluded). Incomplete type frequency lists are rarely useful.

N, V

sample size and vocabulary size corresponding to the type frequency list have to be specified explicitly for incomplete lists

f.min, f.max

frequency range represented in an incomplete type frequency list (see details below)


if TRUE, delete types with f=0f=0 from the type frequency list, after assigning type IDs. This operation does not make the resulting tfl object incomplete.


If f.min and f.max are not specified, but the list is marked as incomplete (with incomplete=TRUE), they are automatically determined from the frequency vector f (making the assumption that all types in this frequency range are listed). Explicit specification of either f.min or f.max implies an incomplete list. In this case, all types outside the specified range will be deleted from the list. If incomplete=FALSE is explicitly given, N and V will be determined automatically from the input data (which is assumed to be complete), but the resulting type frequency list will still be incomplete.

If you just want to remove types with f=0f=0 without marking the type frequency list as incomplete, use the option delete.zeros=TRUE.

A tfl object is a data frame with the following variables:


integer type ID kk


corresponding type frequency fkf_k


optional: character vector with type representations used for printing

The data frame always has to be sorted with respect to the k column (ascending order). If a type column is present, rownames are set to the types and can be used for character indexing.

The following attributes are used to store additional information about the frequency spectrum:

N, V

sample size NN and vocabulary size VV corresponding to the type frequency list. For a complete list, these values could easily be determined from the f variable, but they are essential for an incomplete list.


if TRUE, the type frequency list is incomplete, i.e. it lists only types in the frequency range given by f.min and f.max

f.min, f.max

range of type frequencies represented in the list (should be ignored unless the incomplete flag is set)


indicates whether or not the type variable is present


An object of class tfl representing the specified type frequency list. This object should be treated as read-only (although such behaviour cannot be enforced in R).

See Also

read.tfl, write.tfl, plot.tfl, sample.tfl, spc2tfl, tfl2spc

Generic methods supported by tfl objects are print, summary, N, V and Vm.

Implementation details and non-standard arguments for these methods can be found on the manpages print.tfl, summary.tfl, N.tfl, V.tfl, etc.


## typically, you will read a tfl from a file
## (see examples in the read.tfl manpage)

## or you can load a ready-made tfl

## or create it from a spectrum (with different ids and
## no type labels)

Brown.tfl2 <- spc2tfl(Brown.spc)

## same frequency information as Brown.tfl
## but with different ids and no type labels

## how to display draw a Zipf's rank/frequency plot
## by extracting frequencies from a tfl

## simulating a tfl
Zipfian.tfl <- tfl(1000/(1:1000))

[Package zipfR version 0.6-70 Index]