| KeywordProcessor {rflashtext} | R Documentation |
FlashText algorithm to find and replace words
Description
Based on the python library flashtext. To see more details about the algorithm visit: FlashText
Public fields
attrslist. Stores the attributes of the
KeywordProcessorobject.
Methods
Public methods
Method new()
Initializes the KeywordProcessor object.
Usage
KeywordProcessor$new( keys = NULL, words = NULL, trie = NULL, id = "_word_", chars = paste0(c(letters, LETTERS, 0:9, "_"), collapse = ""), ignore_case = FALSE )
Arguments
keyscharacter vector. Strings to identify (find/replace) in the text. Must be provided if
trieisNULL.wordscharacter vector. Strings to be returned (find) or replaced (replace) when found the respective
keys. Should have the same length askeys. If not provided,words = keys.triecharacter. JSON built character by character and needed for the search. It can be provided instead of
keysandwords.idcharacter. Used to name the end nodes of the
triedictionary.charscharacter. Used to validate if a word continues. Default
paste0(c(letters, LETTERS, 0:9, "_"), collapse = "")equivalent to[a-zA-Z0-9_].ignore_caselogical. If
FALSEthe search is case sensitive. DefaultTRUE.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$attrs
library(rflashtext)
processor <- KeywordProcessor$new(chars = paste0(letters, collapse = ""), keys = c("NY", "LA"))
processor$attrs
Method show_trie()
Shows the trie dictionary used to find/replace keys.
Usage
KeywordProcessor$show_trie()
Returns
character. JSON string of the trie structure. It can be converted to list using jsonlite::fromJSON.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$show_trie()
Method add_keys_words()
Adds keys and words to the trie dictionary.
Usage
KeywordProcessor$add_keys_words(keys, words = NULL)
Arguments
keyscharacter vector. Strings to identify (find/replace) in the text.
wordscharacter vector. Strings to be returned (find) or replaced (replace) when found the respective
keys. Should have the same length askeys. If not provided,words = keys.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$add_keys_words(keys = "CA", words = "California")
processor$show_trie()
Method contain_keys()
Checks if keys are in the trie dictionary.
Usage
KeywordProcessor$contain_keys(keys)
Arguments
keyscharacter vector. Strings to check if already are in the search
triedictionary.
Returns
logical vector. TRUE if the keys are in the search trie dictionary.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$contain_keys(keys = c("NY", "LA", "TX"))
Method get_words()
Gets the words for the keys found in the trie dictionary.
Usage
KeywordProcessor$get_words(keys)
Arguments
keyscharacter vector. Strings to get back the respective
words.
Returns
character vector. Respective words. If keys not found returns NA_character_.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$get_words(keys = c("NY", "LA", "TX"))
Method find_keys()
Finds keys in the sentences using the search trie dictionary.
Usage
KeywordProcessor$find_keys(sentences, span_info = TRUE)
Arguments
sentencescharacter vector. Text to find the
keyspreviously defined.span_infological.
TRUEto retrieve thewordsand the position of the matches.FALSEto only retrieve thewords. DefaultTRUE.
Returns
list with the words corresponding to keys found in the sentence. Hint: Use data.table::rbindlist(...) to transform the list to a data frame.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
words_found <- processor$find_keys(sentences = "I live in LA but I like NY")
words_found
Method replace_keys()
Replaces keys found in the sentences by the corresponding words.
Usage
KeywordProcessor$replace_keys(sentences)
Arguments
sentencescharacter vector. Text to replace the
keysfound by the correspondingwords.
Returns
character vector. Text with the keys replaced by the respective words.
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
new_sentences <- processor$replace_keys(sentences = "I live in LA but I like NY")
new_sentences
Examples
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$contain_keys(keys = "NY")
processor$get_words(keys = "LA")
processor$find_keys(sentences = "I live in LA but I like NY")
processor$replace_keys(sentences = "I live in LA but I like NY")
## ------------------------------------------------
## Method `KeywordProcessor$new`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$attrs
library(rflashtext)
processor <- KeywordProcessor$new(chars = paste0(letters, collapse = ""), keys = c("NY", "LA"))
processor$attrs
## ------------------------------------------------
## Method `KeywordProcessor$show_trie`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$show_trie()
## ------------------------------------------------
## Method `KeywordProcessor$add_keys_words`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$add_keys_words(keys = "CA", words = "California")
processor$show_trie()
## ------------------------------------------------
## Method `KeywordProcessor$contain_keys`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$contain_keys(keys = c("NY", "LA", "TX"))
## ------------------------------------------------
## Method `KeywordProcessor$get_words`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$get_words(keys = c("NY", "LA", "TX"))
## ------------------------------------------------
## Method `KeywordProcessor$find_keys`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
words_found <- processor$find_keys(sentences = "I live in LA but I like NY")
words_found
## ------------------------------------------------
## Method `KeywordProcessor$replace_keys`
## ------------------------------------------------
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
new_sentences <- processor$replace_keys(sentences = "I live in LA but I like NY")
new_sentences