predict.nametagger {nametagger}R Documentation

Perform Named Entity Recognition on tokenised text

Description

Perform Named Entity Recognition on tokenised text using a nametagger model

Usage

## S3 method for class 'nametagger'
predict(object, newdata, split = "[[:space:]]+", ...)

Arguments

object

an object of class nametagger as returned by nametagger_load_model

newdata

a data.frame with tokenised sentences. This data.frame should contain the columns doc_id, sentence_id and text where text contains tokens in vertical format, meaning each token is put on a new line. Column doc_id should be of type character, column sentence_id of type integer.

split

a regular expression used to split newdata. Only used if newdata is a character vector containing text which is not tokenised

...

not used

Value

a data.frame with columns doc_id, sentence_id, token and entity

Examples

path  <- system.file(package = "nametagger", "models", "exampletagger.ner")
model <- nametagger_load_model(path)
model

x <- c("I ga naar Brussel op reis.", "Goed zo dat zal je deugd doen Karel")
entities <- predict(model, x, split = "[[:space:][:punct:]]+")                          
entities


model <- nametagger_download_model("english-conll-140408", model_dir = tempdir())

x <- data.frame(doc_id = c(1, 1, 2),
                sentence_id = c(1, 2, 1),
                text = c("I\nlive\nin\nNew\nYork\nand\nI\nwork\nfor\nApple\nInc.", 
                         "Why\ndon't\nyou\ncome\nvisit\nme", 
                         "Good\nnews\nfrom\nAmazon\nas\nJohn\nworks\nthere\n."))
entities <- predict(model, x)                          
entities


[Package nametagger version 0.1.3 Index]