R: S4 Class kRp.corp.freq

kRp.corp.freq,-class {koRpus}

R Documentation

S4 Class kRp.corp.freq

Description

This class is used for objects that are returned by read.corp.LCC and read.corp.celex.

Details

The slot meta simply contains all information from the "meta.txt" of the LCC[1] data and remains empty for data from a Celex[2] DB.

Slots

meta

Metadata on the corpora (see details).

words

Absolute word frequencies. It has at least the following columns:

num:: Some word ID from the DB, integer
word:: The word itself
lemma:: The lemma of the word
tag:: A part-of-speech tag
wclass:: The word class
lttr:: The number of characters
freq:: The frequency of that word in the corpus DB
pct:: Percentage of appearance in DB
pmio:: Appearance per million words in DB
log10:: Base 10 logarithm of word frequency
rank.avg:: Rank in corpus data, rank ties method "average"
rank.min:: Rank in corpus data, rank ties method "min"
rank.rel.avg:: Relative rank, i.e. percentile of "rank.avg"
rank.rel.min:: Relative rank, i.e. percentile of "rank.min"
inDocs:: The absolute number of documents in the corpus containing the word
idf:: The inverse document frequency

The slot might have additional columns, depending on the input material.

desc

Descriptive information. It contains six numbers from the meta information, for convenient accessibility:

tokens:: Number of running word forms
types:: Number of distinct word forms
words.p.sntc:: Average sentence length in words
chars.p.sntc:: Average sentence length in characters
chars.p.wform:: Average word form length
chars.p.word:: Average running word length

The slot might have additional columns, depending on the input material.

bigrams

A data.frame listing all tokens that co-occurred next to each other in the corpus:

token1:: The first token
token2:: The second token that appeared right next to the first
freq:: How often the co-occurrance was present
sig:: Log-likelihood significance of the co-occurrende

cooccur

Similar to bigrams, but listing co-occurrences anywhere in one sentence:

token1:: The first token
token2:: The second token that appeared in the same sentence
freq:: How often the co-occurrance was present
sig:: Log-likelihood significance of the co-occurrende

caseSens

A single logical value, whether the frequency statistics were calculated case sensitive or not.

Contructor function

Should you need to manually generate objects of this class (which should rarely be the case), the contructor function kRp_corp_freq(...) can be used instead of new("kRp.corp.freq", ...).

References

[1] https://wortschatz.uni-leipzig.de/en/download/ [2] http://celex.mpi.nl

[Package koRpus version 0.13-8 Index]