merge_plus {fedmatch} | R Documentation |
Merge two datasets either by exact, fuzzy, or multivar-based matching
Description
merge_plus
is a wrapper for a standard merge, a fuzzy string match,
and a a “multivar” match based on several columns of the data. Parameters allow
for control for fine-tuning of the match. This is primarily used as the
workhorse for the tier_match
function.
Usage
merge_plus(
data1,
data2,
by = NULL,
by.x = NULL,
by.y = NULL,
suffixes = c("_1", "_2"),
check_merge = TRUE,
unique_key_1,
unique_key_2,
match_type = "exact",
fuzzy_settings = build_fuzzy_settings(),
score_settings = NULL,
filter = NULL,
filter.args = list(),
evaluate = match_evaluate,
evaluate.args = list(),
allow.cartesian = FALSE,
multivar_settings = build_multivar_settings()
)
Arguments
data1 |
data.frame. First to-merge dataset (ordering matters - see Fuzzy Matching vignette.) |
data2 |
data.frame. Second to-merge dataset. |
by |
character string. Variables to merge on (common across data 1 and
data 2). See |
by.x |
length-1 character vector. Variable to merge on in data1. See |
by.y |
length-1 character vector. Variable to merge on in data2. See |
suffixes |
character vector with length==2. Suffix to add to like named
variables after the merge. See |
check_merge |
logical. Checks that your unique_keys are indeed unique. |
unique_key_1 |
character vector. Primary key of data1 that uniquely identifies each row (can be multiple fields) |
unique_key_2 |
character vector. Primary key of data2 that uniquely identifies each row (can be multiple fields) |
match_type |
string. If 'exact', match is exact, if 'fuzzy', match is
fuzzy. If 'multivar,' match is multivar-based. See |
fuzzy_settings |
additional arguments for amatch, to be used if match_type
= 'fuzzy'. Suggested defaults provided. See |
score_settings |
list. Score settings for post-hoc matchscores. See |
filter |
function or numeric. Filters a merged data1-data2 dataset. If a function, should take in a data.frame (data1 and data2 merged by name1 and name2) and spit out a trimmed version of the data.frame (fewer rows). Think of this function as applying other conditions to matches, other than a match by name. The first argument of filter should be the data.frame. If numeric, will drop all observations with a matchscore lower than or equal to filter. |
filter.args |
list. Arguments passed to filter, if a function |
evaluate |
Function to evaluate merge_plus output. |
evaluate.args |
list. Arguments passed to evaluate |
allow.cartesian |
whether or not to allow many-many matches, see data.table::merge() |
multivar_settings |
list of settings to go to the multivar match if match_type
== 'multivar'. See |
Value
list with matches, filtered matches (if applicable), data1 and data2 minus matches, and match evaluation
See Also
match_evaluate