features {NLP} | R Documentation |
Extract Annotation Features
Description
Conveniently extract features from annotations and annotated plain text documents.
Usage
features(x, type = NULL, simplify = TRUE)
Arguments
x |
an object inheriting from class |
type |
a character vector of annotation types to be used for
selecting annotations, or |
simplify |
a logical indicating whether to simplify feature values to a vector. |
Details
features()
conveniently gathers all feature tag-value pairs in
the selected annotations into a data frame with variables the values
for all tags found (using a NULL
value for tags without a
value). In general, variables will be lists of extracted
values. By default, variables where all elements are length one
atomic vectors are simplified into an atomic vector of values. The
values for specific tags can be extracted by suitably subscripting the
obtained data frame.
Examples
## Use a pre-built annotated plain text document,
## see ? AnnotatedPlainTextDocument.
doc <- readRDS(system.file("texts", "stanford.rds", package = "NLP"))
## Extract features of all *word* annotations in doc:
x <- features(doc, "word")
## Could also have abbreviated "word" to "w".
x
## Only lemmas:
x$lemma
## Words together with lemmas:
paste(words(doc), x$lemma, sep = "/")