spacy_extract_nounphrases {spacyr} | R Documentation |
Extract noun phrases from texts using spaCy
Description
This function extracts noun phrases from documents, based on the
noun_chunks
attributes of documents objects parsed by spaCy (see
https://spacy.io/usage/linguistic-features#noun-chunks).
Usage
spacy_extract_nounphrases(
x,
output = c("data.frame", "list"),
multithread = TRUE,
...
)
Arguments
x |
a character object or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif) |
output |
type of returned object, either |
multithread |
logical; If |
... |
unused |
Details
When the option output = "data.frame"
is selected, the
function returns a data.frame
with the following fields.
text
contents of noun-phrase
root_text
contents of root token
start_id
serial number ID of starting token. This number corresponds with the number of
data.frame
returned fromspacy_tokenize(x)
with default options.root_id
serial number ID of root token
length
number of words (tokens) included in a noun-phrase (e.g. for a noun-phrase, "individual car owners",
length = 3
)
Value
either a list
or data.frame
of tokens
Examples
## Not run:
spacy_initialize()
txt <- c(doc1 = "Natural language processing is a branch of computer science.",
doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_extract_nounphrases(txt)
spacy_extract_nounphrases(txt, output = "list")
## End(Not run)