Maxent_Entity_Annotator {openNLP}R Documentation

Apache OpenNLP based entity annotators

Description

Generate an annotator which computes entity annotations using the Apache OpenNLP Maxent name finder.

Usage

Maxent_Entity_Annotator(language = "en", kind = "person", probs = FALSE,
                        model = NULL)

Arguments

language

a character string giving the ISO-639 code of the language being processed by the annotator.

kind

a character string giving the ‘kind’ of entity to be annotated (person, date, ...).

probs

a logical indicating whether the computed annotations should provide the token probabilities obtained from the Maxent model as their ‘prob’ feature.

model

a character string giving the path to the Maxent model file to be used, or NULL indicating to use a default model file for the given language (if available, see Details).

Details

See http://opennlp.sourceforge.net/models-1.5/ for available model files. These can conveniently be made available to R by installing the respective openNLPmodels.language package from the repository at https://datacube.wu.ac.at.

Value

An Annotator object giving the generated entity annotator.

See Also

https://opennlp.apache.org for more information about Apache OpenNLP.

Examples


## Requires package 'openNLPmodels.en' from the repository at
## <https://datacube.wu.ac.at>.

require("NLP")
## Some text.
s <- paste(c("Pierre Vinken, 61 years old, will join the board as a ",
             "nonexecutive director Nov. 29.\n",
             "Mr. Vinken is chairman of Elsevier N.V., ",
             "the Dutch publishing group."),
           collapse = "")
s <- as.String(s)

## Need sentence and word token annotations.
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))

## Entity recognition for persons.
entity_annotator <- Maxent_Entity_Annotator()
entity_annotator
annotate(s, entity_annotator, a2)
## Directly:
entity_annotator(s, a2)
## And slice ...
s[entity_annotator(s, a2)]
## Variant with sentence probabilities as features.
annotate(s, Maxent_Entity_Annotator(probs = TRUE), a2)


[Package openNLP version 0.2-7 Index]