R: Extract named entities from texts using spaCy

spacy_extract_entity {spacyr}

R Documentation

Extract named entities from texts using spaCy

Description

This function extracts named entities from texts, based on the entity tag ent attributes of documents objects parsed by spaCy (see https://spacy.io/usage/linguistic-features#section-named-entities).

Usage

spacy_extract_entity(
  x,
  output = c("data.frame", "list"),
  type = c("all", "named", "extended"),
  multithread = TRUE,
  ...
)

Arguments

`x`	a character object or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif)
`output`	type of returned object, either `"list"` or `"data.frame"`.
`type`	type of named entities, either `named`, `extended`, or `all`. See https://spacy.io/docs/usage/entity-recognition#entity-types for details.
`multithread`	logical; If `TRUE`, the processing is parallelized using spaCy's architecture (https://spacy.io/api)
`...`	unused

Details

When the option output = "data.frame" is selected, the function returns a data.frame with the following fields.

text: contents of entity
entity_type: type of entity (e.g. ORG for organizations)
start_id: serial number ID of starting token. This number corresponds with the number of data.frame returned from spacy_tokenize(x) with default options.
length: number of words (tokens) included in a named entity (e.g. for an entity, "New York Stock Exchange"", length = 4)

Value

either a list or data.frame of tokens

Examples

## Not run: 
spacy_initialize()

txt <- c(doc1 = "The Supreme Court is located in Washington D.C.",
         doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_entity(txt)
spacy_extract_entity(txt, output = "list")

## End(Not run)

[Package spacyr version 1.3.0 Index]