state_replace {messy.cats} | R Documentation |
state_replace
Description
A wrapper function for cat_replace()
that only requires an inputted
vector of messy US state names. state_replace()
uses the built-in character
vector state.name
as the reference clean vector.
Usage
state_replace(messy_states, threshold = NA, p = 0)
Arguments
messy_states |
Vector containing the messy state names that will be replaced
by the closest match from |
threshold |
The maximum distance that will form a match. If this argument is specified, any element in the messy vector that has no match closer than the threshold distance will be replaced with NA. Default: NA |
p |
Only used with method "jw", the Jaro-Winkler penatly size. Default: 0 |
Details
State names are often misspelled or abbreviated in datasets, especially datasets that have been
manually digitized or created. state_replace()
is a warpper function of cat_replace()
that quickly solves
this common issue of mispellings or different formats of state names across datasets. This wrapper
function uses a built in clean list of country names state.name
as the reference clean vector and
replaces your inputted messy vector of names to their nearest match in state.name
.
Value
state_replace()
returns a cleaned version of the bad vector, with each
element replaced by the most similar element of the good vector.
Examples
if(interactive()){
#EXAMPLE1
lst <- c("Indianaa", "Wisvconsin", "aLaska", "NewJersey", "Claifoarni")
fixed <- state_replace(lst)
}