lang.support.en {koRpus.lang.en} | R Documentation |
Language support for English
Description
This function adds support for English to the koRpus package. You should not need to call it manually, as that is done automatically when this package is being loaded.
Usage
lang.support.en(...)
Arguments
... |
Optional arguments for |
Details
The POS tags cover tag definitions from multiple sources. Please note that there is one tag, "PRP", that is defined in both PENN[3] and BNC[4] tagsets, but with different meanings: The PENN tag marks personal pronouns, whereas the BNC tag marks prepositions (except "of"). Since the conflicting tag is not being used by TreeTagger's PENN parameter set, but in its BNC set, koRpus also uses the BNC definition. Keep this in mind if you use this language support package with alternative taggers.
In particular, this function adds the following:
-
lang
: The additional language "en" to be used with koRpus -
treetag
: The additional preset "en", implemented according to the respective TreeTagger[1] script -
POS tags
: An additional set of tags, implemented using the documentation for the corresponding TreeTagger parameter set[2], additional tags from the PENN treebank project[3], and the BNC tagset[4] used in an alternative TreeTagger parameter set.
Hyphenation patterns are provided by means of the sylly.en
package.
References
[1] http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
[2] http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/data/Penn-Treebank-Tagset.pdf
[3] https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
[4] http://www.natcorp.ox.ac.uk/docs/c5spec.html
Examples
lang.support.en()