spacy_parse {spacyr} | R Documentation |
Parse a text using spaCy
Description
The spacy_parse()
function calls spaCy to both tokenize and tag the
texts, and returns a data.table of the results. The function provides options
on the types of tagsets (tagset_
options) either "google"
or
"detailed"
, as well as lemmatization (lemma
). It provides a
functionalities of dependency parsing and named entity recognition as an
option. If "full_parse = TRUE"
is provided, the function returns the
most extensive list of the parsing results from spaCy.
Usage
spacy_parse(
x,
pos = TRUE,
tag = FALSE,
lemma = TRUE,
entity = TRUE,
dependency = FALSE,
nounphrase = FALSE,
multithread = TRUE,
additional_attributes = NULL,
...
)
Arguments
x |
a character object, a quanteda corpus, or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif) |
pos |
logical whether to return universal dependency POS tagset https://universaldependencies.org/u/pos/) |
tag |
logical whether to return detailed part-of-speech tags, for the
language model |
lemma |
logical; include lemmatized tokens in the output (lemmatization may not work properly for non-English models) |
entity |
logical; if |
dependency |
logical; if |
nounphrase |
logical; if |
multithread |
logical; If |
additional_attributes |
a character vector; this option is for
extracting additional attributes of tokens from spaCy. When the names of
attributes are supplied, the output data.frame will contain additional
variables corresponding to the names of the attributes. For instance, when
|
... |
not used directly |
Value
a data.frame
of tokenized, parsed, and annotated tokens
Examples
## Not run:
spacy_initialize()
# See Chap 5.1 of the NLTK book, http://www.nltk.org/book/ch05.html
txt <- "And now for something completely different."
spacy_parse(txt)
spacy_parse(txt, pos = TRUE, tag = TRUE)
spacy_parse(txt, dependency = TRUE)
txt2 <- c(doc1 = "The fast cat catches mice.\\nThe quick brown dog jumped.",
doc2 = "This is the second document.",
doc3 = "This is a \\\"quoted\\\" text." )
spacy_parse(txt2, entity = TRUE, dependency = TRUE)
txt3 <- "We analyzed the Supreme Court with three natural language processing tools."
spacy_parse(txt3, entity = TRUE, nounphrase = TRUE)
spacy_parse(txt3, additional_attributes = c("like_num", "is_punct"))
## End(Not run)