problink_em {reclin2}R Documentation

Calculate EM-estimates of m- and u-probabilities

Description

Calculate EM-estimates of m- and u-probabilities

Usage

problink_em(
  formula,
  data,
  patterns,
  mprobs0 = list(0.95),
  uprobs0 = list(0.02),
  p0 = 0.05,
  tol = 1e-05,
  mprob_max = 0.999,
  uprob_min = 1e-04
)

Arguments

formula

a formula object with the variables for which to calculate the m- and u-probabilities. Should be of the form ~ var1 + var2.

data

data set with pairs on which to estimate the model. Alternatively one can use the patterns argument.

patterns

table of patterns (as output by tabulate_patterns).

mprobs0, uprobs0

initial values of the m- and u-probabilities. These should be lists with numeric values. The names of the elements in the list should correspond to the names in by_x in compare_pairs.

p0

the initial estimate of the probability that a pair is a match.

tol

when the change in the m and u-probabilities is smaller than tol the algorithm is stopped.

mprob_max

maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities.

uprob_min

maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities.

Value

Returns an object of type problink_em. This is a list containing the estimated mprobs, uprobs and overall linkage probability p. It also contains the table of comparison patterns.

References

Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. doi:10.2307/2286061.

Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.

Examples

data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
summary(model)


[Package reclin2 version 0.5.0 Index]