word_coverage {sbo} | R Documentation |
Word coverage fraction
Description
Compute total and cumulative corpus coverage fraction of a dictionary.
Usage
word_coverage(object, corpus, ...)
## S3 method for class 'sbo_dictionary'
word_coverage(object, corpus, ...)
## S3 method for class 'character'
word_coverage(object, corpus, .preprocess = identity, EOS = "", ...)
## S3 method for class 'sbo_kgram_freqs'
word_coverage(object, corpus, ...)
## S3 method for class 'sbo_predictions'
word_coverage(object, corpus, ...)
Arguments
object |
either a character vector, or an object inheriting from one of
the classes |
corpus |
a character vector. |
... |
further arguments passed to or from other methods. |
.preprocess |
preprocessing function for training corpus. See
|
EOS |
a length one character vector. String containing End-Of-Sentence
characters, see |
Details
This function computes the corpus coverage fraction of a dictionary, that is the fraction of words appearing in corpus which are contained in the original dictionary.
This function is a generic, accepting as object
argument any object
storing a dictionary, along with a preprocessing function and a list
of End-Of-Sentence characters. This includes all sbo
main classes:
sbo_dictionary
, sbo_kgram_freqs
, sbo_predtable
and
sbo_predictor
. When object
is a character vector, the preprocessing
function and the End-Of-Sentence characters must be specified explicitly.
The coverage fraction is computed cumulatively, and the dependence of
coverage with respect to maximal rank can be explored through plot()
(see examples below)
Value
a word_coverage
object.
Author(s)
Valerio Gherardi
See Also
Examples
c <- word_coverage(twitter_dict, twitter_train)
print(c)
summary(c)
# Plot coverage fraction, including the End-Of-Sentence in word counts.
plot(c, include_EOS = TRUE)