spacy_extract_entity {spacyr}R Documentation

Extract named entities from texts using spaCy

Description

This function extracts named entities from texts, based on the entity tag ent attributes of documents objects parsed by spaCy (see https://spacy.io/usage/linguistic-features#section-named-entities).

Usage

spacy_extract_entity(
  x,
  output = c("data.frame", "list"),
  type = c("all", "named", "extended"),
  multithread = TRUE,
  ...
)

Arguments

x

a character object or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif)

output

type of returned object, either "list" or "data.frame".

type

type of named entities, either named, extended, or all. See https://spacy.io/docs/usage/entity-recognition#entity-types for details.

multithread

logical; If TRUE, the processing is parallelized using spaCy's architecture (https://spacy.io/api)

...

unused

Details

When the option output = "data.frame" is selected, the function returns a data.frame with the following fields.

text

contents of entity

entity_type

type of entity (e.g. ORG for organizations)

start_id

serial number ID of starting token. This number corresponds with the number of data.frame returned from spacy_tokenize(x) with default options.

length

number of words (tokens) included in a named entity (e.g. for an entity, "New York Stock Exchange"", length = 4)

Value

either a list or data.frame of tokens

Examples

## Not run: 
spacy_initialize()

txt <- c(doc1 = "The Supreme Court is located in Washington D.C.",
         doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_entity(txt)
spacy_extract_entity(txt, output = "list")

## End(Not run)

[Package spacyr version 1.3.0 Index]