key_collision_merge {refinr} | R Documentation |
Value merging based on Key Collision
Description
This function takes a character vector and makes edits and merges values that are approximately equivalent yet not identical. It clusters values based on the key collision method, described here https://openrefine.org/docs/technical-reference/clustering-in-depth.
Usage
key_collision_merge(
vect,
ignore_strings = NULL,
bus_suffix = TRUE,
dict = NULL
)
Arguments
vect |
Character vector, items to be potentially clustered and merged. |
ignore_strings |
Character vector, these strings will be ignored during
the merging of values within |
bus_suffix |
Logical, indicating whether the merging of records should be insensitive to common business suffixes or not. Default value is TRUE. |
dict |
Character vector, meant to act as a dictionary during the
merging process. If any items within |
Value
Character vector with similar values merged.
Examples
x <- c("Acme Pizza, Inc.", "ACME PIZZA COMPANY", "pizza, acme llc",
"Acme Pizza, Inc.")
key_collision_merge(vect = x)
# Use parameter "dict" to influence how clustered values are edited.
key_collision_merge(vect = x, dict = c("Nicks Pizza", "acme PIZZA inc"))
# Use parameter 'ignore_strings' to ignore specific strings during merging
# of values.
x <- c("Bakersfield Highschool", "BAKERSFIELD high",
"high school, bakersfield")
key_collision_merge(x, ignore_strings = c("high", "school", "highschool"))