| spacy_extract_entity {spacyr} | R Documentation |
Extract named entities from texts using spaCy
Description
This function extracts named entities from texts, based on the entity tag
ent attributes of documents objects parsed by spaCy (see
https://spacy.io/usage/linguistic-features#section-named-entities).
Usage
spacy_extract_entity(
x,
output = c("data.frame", "list"),
type = c("all", "named", "extended"),
multithread = TRUE,
...
)
Arguments
x |
a character object or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif) |
output |
type of returned object, either |
type |
type of named entities, either |
multithread |
logical; If |
... |
unused |
Details
When the option output = "data.frame" is selected, the
function returns a data.frame with the following fields.
textcontents of entity
entity_typetype of entity (e.g.
ORGfor organizations)start_idserial number ID of starting token. This number corresponds with the number of
data.framereturned fromspacy_tokenize(x)with default options.lengthnumber of words (tokens) included in a named entity (e.g. for an entity, "New York Stock Exchange"",
length = 4)
Value
either a list or data.frame of tokens
Examples
## Not run:
spacy_initialize()
txt <- c(doc1 = "The Supreme Court is located in Washington D.C.",
doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_entity(txt)
spacy_extract_entity(txt, output = "list")
## End(Not run)