split_locations {weed}R Documentation

Splits string of manually entered locations into one row for each location

Description

Changes the unit of analysis from a disaster, to a disaster-location. This is useful as preprocessing before geocoding each disaster-location pair.

Can be used in piped operations, making it tidy!

Usage

split_locations(
  .,
  column_name = "locations",
  dummy_words = c("cities", "states", "provinces", "districts", "municipalities",
    "regions", "villages", "city", "state", "province", "district", "municipality",
    "region", "township", "village", "near", "department"),
  joiner_regex = ",|\\(|\\)|;|\\+|( and )|( of )"
)

Arguments

.

data frame of disaster data

column_name

name of the column containing the locations

dummy_words

a vector of words that we don't want in our final output.

joiner_regex

a regex that tells us how to split the locations

Value

same data frame with the location_word column added as well as a column called uncertain_location_specificity where the same location could be referred to in varying levels of specificity

Examples

locs <- c("city of new york", "kerala, chennai municipality, and san francisco",
"mumbai region, district of seattle, sichuan province")
d <- tibble::as_tibble(locs)
split_locations(d, column_name = "value")


[Package weed version 1.1.2 Index]