country_replace {messy.cats}R Documentation

country_replace

Description

A wrapper function for cat_replace() that only requires an inputted vector of messy countries. country_replace() uses a built in clean list of country names country.names as the reference clean vector.

Usage

country_replace(messy_countries, threshold = NA, p = 0)

Arguments

messy_countries

Vector containing the messy country names that will be replaced by the closest match from country.names

threshold

The maximum distance that will form a match. If this argument is specified, any element in the messy vector that has no match closer than the threshold distance will be replaced with NA. Default: NA

p

Only used with method "jw", the Jaro-Winkler penatly size. Default: 0

Details

Country names are often misspelled or abbreviated in datasets, especially datasets that have been manually digitized or created. country_replace() is a warpper function of cat_replace() that quickly solves this common issue of mispellings or different formats of country names across datasets. This wrapper function uses a built in clean list of country names country.names as the reference clean vector and replaces your inputted messy vector of names to their nearest match in country.names.

Value

country_replace() returns a cleaned version of the bad vector, with each element replaced by the most similar element of the good vector.

Examples

if(interactive()){
 #EXAMPLE1
 lst <- c("Conagoa", "Blearaus", "Venzesual", "Uruagsya", "England")
 fixed <- country_replace(lst)
 }

[Package messy.cats version 1.0 Index]