step_pos_filter {textrecipes} | R Documentation |
Part of Speech Filtering of Token Variables
Description
step_pos_filter()
creates a specification of a recipe step that will
filter a token
variable based on part of speech tags.
Usage
step_pos_filter(
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
keep_tags = "NOUN",
skip = FALSE,
id = rand_id("pos_filter")
)
Arguments
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables are affected by the step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
columns |
A character string of variable names that will
be populated (eventually) by the |
keep_tags |
Character variable of part of speech tags to keep. See details for complete list of tags. Defaults to "NOUN". |
skip |
A logical. Should the step be skipped when the
recipe is baked by |
id |
A character string that is unique to this step to identify it. |
Details
Possible part of speech tags for spacyr
engine are: "ADJ", "ADP", "ADV",
"AUX", "CONJ", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PART", "PRON",
"PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X" and "SPACE". For more
information look here
https://github.com/explosion/spaCy/blob/master/spacy/glossary.py.
Value
An updated version of recipe
with the new step added
to the sequence of existing steps (if any).
Tidying
When you tidy()
this step, a tibble with columns terms
(the selectors or variables selected) and num_topics
(number of topics).
Case weights
The underlying operation does not allow for case weights.
See Also
step_tokenize()
to turn characters into tokens
Other Steps for Token Modification:
step_lemma()
,
step_ngram()
,
step_stem()
,
step_stopwords()
,
step_tokenfilter()
,
step_tokenmerge()
Examples
## Not run:
library(recipes)
short_data <- data.frame(text = c(
"This is a short tale,",
"With many cats and ladies."
))
rec_spec <- recipe(~text, data = short_data) %>%
step_tokenize(text, engine = "spacyr") %>%
step_pos_filter(text, keep_tags = "NOUN") %>%
step_tf(text)
rec_prepped <- prep(rec_spec)
bake(rec_prepped, new_data = NULL)
## End(Not run)