clean_strings {fedmatch} | R Documentation |
String cleaning for easier matching
Description
clean_strings
takes a string vector and cleans it according to user-given options.
Usage
clean_strings(
string,
sp_char_words = fedmatch::sp_char_words,
common_words = NULL,
remove_char = NULL,
remove_words = FALSE,
stem = FALSE
)
Arguments
string |
character or character vector of strings |
sp_char_words |
character vector. Data.frame where first column is special characters and second column is full words. The default is |
common_words |
data.frame. Data.frame where first column is abbreviations and second column is full words. |
remove_char |
character vector. string of specific characters (for example, "letters") to be removed |
remove_words |
logical. If TRUE, removes all abbreviations and replacement words in common_words |
stem |
logical. If TRUE, words are stemmed |
Details
This function takes a variety of options, each of which changes the behavior.
Without the default settings, clean_strings
will do the following:
make the string lowercase; replace special characters &, $, \
names ("and", "dollar", "percent", "at"); convert tabs to spaces and removes extra spaces.
This default cleaning puts the strings in a standard format to allow for easier matching.
The other options allow for the removal or replacement of other words or characters.
Value
cleaned strings