KeywordProcessor {rflashtext} | R Documentation |
FlashText algorithm to find and replace words
Description
Based on the python library flashtext. To see more details about the algorithm visit: FlashText
Public fields
attrs
list. Stores the attributes of the
KeywordProcessor
object.
Methods
Public methods
Method new()
Initializes the KeywordProcessor
object.
Usage
KeywordProcessor$new( keys = NULL, words = NULL, trie = NULL, id = "_word_", chars = paste0(c(letters, LETTERS, 0:9, "_"), collapse = ""), ignore_case = FALSE )
Arguments
keys
character vector. Strings to identify (find/replace) in the text. Must be provided if
trie
isNULL
.words
character vector. Strings to be returned (find) or replaced (replace) when found the respective
keys
. Should have the same length askeys
. If not provided,words = keys
.trie
character. JSON built character by character and needed for the search. It can be provided instead of
keys
andwords
.id
character. Used to name the end nodes of the
trie
dictionary.chars
character. Used to validate if a word continues. Default
paste0(c(letters, LETTERS, 0:9, "_"), collapse = "")
equivalent to[a-zA-Z0-9_]
.ignore_case
logical. If
FALSE
the search is case sensitive. DefaultTRUE
.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) processor$attrs
library(rflashtext) processor <- KeywordProcessor$new(chars = paste0(letters, collapse = ""), keys = c("NY", "LA")) processor$attrs
Method show_trie()
Shows the trie
dictionary used to find/replace keys
.
Usage
KeywordProcessor$show_trie()
Returns
character. JSON string of the trie
structure. It can be converted to list using jsonlite::fromJSON
.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) processor$show_trie()
Method add_keys_words()
Adds keys
and words
to the trie
dictionary.
Usage
KeywordProcessor$add_keys_words(keys, words = NULL)
Arguments
keys
character vector. Strings to identify (find/replace) in the text.
words
character vector. Strings to be returned (find) or replaced (replace) when found the respective
keys
. Should have the same length askeys
. If not provided,words = keys
.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) processor$add_keys_words(keys = "CA", words = "California") processor$show_trie()
Method contain_keys()
Checks if keys
are in the trie
dictionary.
Usage
KeywordProcessor$contain_keys(keys)
Arguments
keys
character vector. Strings to check if already are in the search
trie
dictionary.
Returns
logical vector. TRUE
if the keys
are in the search trie
dictionary.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) processor$contain_keys(keys = c("NY", "LA", "TX"))
Method get_words()
Gets the words
for the keys
found in the trie
dictionary.
Usage
KeywordProcessor$get_words(keys)
Arguments
keys
character vector. Strings to get back the respective
words
.
Returns
character vector. Respective words
. If keys
not found returns NA_character_
.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) processor$get_words(keys = c("NY", "LA", "TX"))
Method find_keys()
Finds keys
in the sentences using the search trie
dictionary.
Usage
KeywordProcessor$find_keys(sentences, span_info = TRUE)
Arguments
sentences
character vector. Text to find the
keys
previously defined.span_info
logical.
TRUE
to retrieve thewords
and the position of the matches.FALSE
to only retrieve thewords
. DefaultTRUE
.
Returns
list with the words
corresponding to keys
found in the sentence
. Hint: Use data.table::rbindlist(...)
to transform the list to a data frame.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) words_found <- processor$find_keys(sentences = "I live in LA but I like NY") words_found
Method replace_keys()
Replaces keys
found in the sentences by the corresponding words
.
Usage
KeywordProcessor$replace_keys(sentences)
Arguments
sentences
character vector. Text to replace the
keys
found by the correspondingwords
.
Returns
character vector. Text with the keys
replaced by the respective words
.
Examples
library(rflashtext) processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles")) new_sentences <- processor$replace_keys(sentences = "I live in LA but I like NY") new_sentences
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$contain_keys(keys = "NY")
processor$get_words(keys = "LA")
processor$find_keys(sentences = "I live in LA but I like NY")
processor$replace_keys(sentences = "I live in LA but I like NY")
## ------------------------------------------------
## Method `KeywordProcessor$new`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$attrs
library(rflashtext)
processor <- KeywordProcessor$new(chars = paste0(letters, collapse = ""), keys = c("NY", "LA"))
processor$attrs
## ------------------------------------------------
## Method `KeywordProcessor$show_trie`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$show_trie()
## ------------------------------------------------
## Method `KeywordProcessor$add_keys_words`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$add_keys_words(keys = "CA", words = "California")
processor$show_trie()
## ------------------------------------------------
## Method `KeywordProcessor$contain_keys`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$contain_keys(keys = c("NY", "LA", "TX"))
## ------------------------------------------------
## Method `KeywordProcessor$get_words`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$get_words(keys = c("NY", "LA", "TX"))
## ------------------------------------------------
## Method `KeywordProcessor$find_keys`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
words_found <- processor$find_keys(sentences = "I live in LA but I like NY")
words_found
## ------------------------------------------------
## Method `KeywordProcessor$replace_keys`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
new_sentences <- processor$replace_keys(sentences = "I live in LA but I like NY")
new_sentences