parse.pos.tags {stylo} | R Documentation |
Extract POS-tags or Words from Annotated Corpora
Description
Function for extracting textual data from annotated corpora. It uderstands Stanford Tagger, TreeTagger TaKIPI (a tagger for Polish), and Alpino (a tagger for Dutch) output formats. Either part-of-speech tags, or words, or lemmata can be extracted.
Usage
parse.pos.tags(input.text, tagger = "stanford", feature = "pos")
Arguments
input.text |
any string of characters (e.g. vector) containing markup tags that have to be deleted. |
tagger |
choose the input format: "stanford" for Stanford Tagger, "treetagger" for TreeTagger, "takipi" for TaKIPI. |
feature |
choose "pos" (default), "word", or "lemma" (this one is not available for the Stanford-formatted input). |
Value
If the function is applied to a single text, then a vector of extracted features is returned. If it is applied to a corpus (a list, preferably of a class "stylo.corpus"), then a list of preprocessed texts are returned.
Author(s)
Maciej Eder
See Also
load.corpus
, txt.to.words
,
txt.to.words.ext
, txt.to.features
Examples
text = "I_PRP have_VBP just_RB returned_VBN from_IN a_DT visit_NN
to_TO my_PRP$ landlord_NN -_: the_DT solitary_JJ neighbor_NN that_IN
I_PRP shall_MD be_VB troubled_VBN with_IN ._. This_DT is_VBZ certainly_RB
a_DT beautiful_JJ country_NN !_. In_IN all_DT England_NNP ,_, I_PRP do_VBP
not_RB believe_VB that_IN I_PRP could_MD have_VB fixed_VBN on_IN a_DT
situation_NN so_RB completely_RB removed_VBN from_IN the_DT stir_VB of_IN
society_NN ._."
parse.pos.tags(text, tagger = "stanford", feature = "word")
parse.pos.tags(text, tagger = "stanford", feature = "pos")