R: Calculate weights for computing matchscore

calculate_weights {fedmatch}

R Documentation

Calculate weights for computing matchscore

Description

Calculate weights for comparison variables based on m and u probabilities estimated from a verified dataset.

Usage

calculate_weights(
  data,
  variables,
  compare_type = "stringdist",
  suffixes = c("_1", "_2"),
  non_negative = FALSE
)

Arguments

`data`	data.frame. Verified data. Should have all of the variables you want to calculate weights for from both datasets, named the same with data-specific suffixes.
`variables`	character vector of the variable names of the variables you want to calculate weights for.
`compare_type`	character vector. One of 'stringdist' (for string variables) 'ratio','difference' (for numerics) 'indicator' (0-1 dummy indicating if the two are the same),'in' (0-1 dummy indicating if data1 is IN data2), and 'substr' (numeric indicating how many digits are the same.)
`suffixes`	character vector. Suffixes of of the variables that indicate what data they are from. Default is same as the default for base R merge, c('.x','.y')
`non_negative`	logical. Do you want to allow negative weights?

Details

This function uses the classic Record Linkage methodology first developed by Felligi and Sunter. See Record Linkage. m is the probability of a given link between observations is a true match, while u is the probability of an unlinked pair of observations being a true match. calculate_weights computes a preliminary weight for each variable by computing

w = \log_2 (\frac{m}{u}),

then making these weights sum to 1. Thus, the weights that have higher m and lower u probabilities will get higher weights, which makes sense given the definitions. These weights can then be easily passed into the score_settings argument of merge_plus or tier_match, or into the wgts argument of multivar_match.

Value

list with m probabilities, u probabilities, w weights, and settings, the list argument required as an input for score_settings in merge_plus using the calculate weights.

[Package fedmatch version 2.0.6 Index]