tagsets {NLP} | R Documentation |
NLP Tag Sets
Description
Tag sets frequently used in Natural Language Processing.
Usage
Penn_Treebank_POS_tags
Brown_POS_tags
Universal_POS_tags
Universal_POS_tags_map
Details
Penn_Treebank_POS_tags
and Brown_POS_tags
provide,
respectively, the Penn Treebank POS tags
(https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, Table 2)
and the POS tags used for the Brown corpus
(http://www.hit.uib.no/icame/brown/bcm.html),
both as data frames with the following variables:
- entry
a character vector with the POS tags
- description
a character vector with short descriptions of the tags
- examples
a character vector with examples for the tags
Universal_POS_tags
provides the universal POS tagset introduced
by Slav Petrov, Dipanjan Das, and Ryan McDonald
(https://arxiv.org/abs/1104.2086), as a data frame with character
variables entry
and description
.
Universal_POS_tags_map
is a named list of mappings from
language and treebank specific POS tagsets to the universal POS tags,
with elements named ‘en-ptb’ and ‘en-brown’ giving the
mappings, respectively, for the Penn Treebank and Brown POS tags.
Source
https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html, http://www.hit.uib.no/icame/brown/bcm.html, https://github.com/slavpetrov/universal-pos-tags.
Examples
## Penn Treebank POS tags
dim(Penn_Treebank_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Penn_Treebank_POS_tags, 20L))
## Brown POS tags
dim(Brown_POS_tags)
## Inspect first 20 entries:
write.dcf(head(Brown_POS_tags, 20L))
## Universal POS tags
Universal_POS_tags
## Available mappings to universal POS tags
names(Universal_POS_tags_map)