R: Word coverage fraction

word_coverage {sbo}

R Documentation

Word coverage fraction

Description

Compute total and cumulative corpus coverage fraction of a dictionary.

Usage

word_coverage(object, corpus, ...)

## S3 method for class 'sbo_dictionary'
word_coverage(object, corpus, ...)

## S3 method for class 'character'
word_coverage(object, corpus, .preprocess = identity, EOS = "", ...)

## S3 method for class 'sbo_kgram_freqs'
word_coverage(object, corpus, ...)

## S3 method for class 'sbo_predictions'
word_coverage(object, corpus, ...)

Arguments

`object`	either a character vector, or an object inheriting from one of the classes `sbo_dictionary`, `sbo_kgram_freqs`, `sbo_predtable` or `sbo_predictor`. The object storing the dictionary for which corpus coverage is to be computed.
`corpus`	a character vector.
`...`	further arguments passed to or from other methods.
`.preprocess`	preprocessing function for training corpus. See `kgram_freqs` and `sbo_dictionary` for further details.
`EOS`	a length one character vector. String containing End-Of-Sentence characters, see `kgram_freqs` and `sbo_dictionary` for further details.

Details

This function computes the corpus coverage fraction of a dictionary, that is the fraction of words appearing in corpus which are contained in the original dictionary.

This function is a generic, accepting as object argument any object storing a dictionary, along with a preprocessing function and a list of End-Of-Sentence characters. This includes all sbo main classes: sbo_dictionary, sbo_kgram_freqs, sbo_predtable and sbo_predictor. When object is a character vector, the preprocessing function and the End-Of-Sentence characters must be specified explicitly.

The coverage fraction is computed cumulatively, and the dependence of coverage with respect to maximal rank can be explored through plot() (see examples below)

Value

a word_coverage object.

Author(s)

Valerio Gherardi

Examples


c <- word_coverage(twitter_dict, twitter_train)
print(c)
summary(c)
# Plot coverage fraction, including the End-Of-Sentence in word counts.
plot(c, include_EOS = TRUE)

[Package sbo version 0.5.0 Index]