char_select {quanteda} | R Documentation |
Select or remove elements from a character vector
Description
These function select or discard elements from a character object. For
convenience, the functions char_remove
and char_keep
are defined as
shortcuts for char_select(x, pattern, selection = "remove")
and
char_select(x, pattern, selection = "keep")
, respectively.
These functions make it easy to change, for instance, stopwords based on pattern matching.
Usage
char_select(
x,
pattern,
selection = c("keep", "remove"),
valuetype = c("glob", "fixed", "regex"),
case_insensitive = TRUE
)
char_remove(x, ...)
char_keep(x, ...)
Arguments
x |
an input character vector |
pattern |
a character vector, list of character vectors, dictionary, or collocations object. See pattern for details. |
selection |
whether to |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
... |
additional arguments passed by |
Value
a modified character vector
Examples
# character selection
mykeywords <- c("natural", "national", "denatured", "other")
char_select(mykeywords, "nat*", valuetype = "glob")
char_select(mykeywords, "nat", valuetype = "regex")
char_select(mykeywords, c("natur*", "other"))
char_select(mykeywords, c("natur*", "other"), selection = "remove")
# character removal
char_remove(letters[1:5], c("a", "c", "x"))
words <- c("any", "and", "Anna", "as", "announce", "but")
char_remove(words, "an*")
char_remove(words, "an*", case_insensitive = FALSE)
char_remove(words, "^.n.+$", valuetype = "regex")
# remove some of the system stopwords
stopwords("en", source = "snowball")[1:6]
stopwords("en", source = "snowball")[1:6] |>
char_remove(c("me", "my*"))
# character keep
char_keep(letters[1:5], c("a", "c", "x"))