| problink_em {reclin2} | R Documentation |
Calculate EM-estimates of m- and u-probabilities
Description
Calculate EM-estimates of m- and u-probabilities
Usage
problink_em(
formula,
data,
patterns,
mprobs0 = list(0.95),
uprobs0 = list(0.02),
p0 = 0.05,
tol = 1e-05,
mprob_max = 0.999,
uprob_min = 1e-04
)
Arguments
formula |
a formula object with the variables for which to calculate the
m- and u-probabilities. Should be of the form |
data |
data set with pairs on which to estimate the model. Alternatively
one can use the |
patterns |
table of patterns (as output by
|
mprobs0, uprobs0 |
initial values of the m- and u-probabilities. These
should be lists with numeric values. The names of the elements in the list
should correspond to the names in |
p0 |
the initial estimate of the probability that a pair is a match. |
tol |
when the change in the m and u-probabilities is smaller than |
mprob_max |
maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities. |
uprob_min |
maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities. |
Value
Returns an object of type problink_em. This is a list containing the
estimated mprobs, uprobs and overall linkage probability
p. It also contains the table of comparison patterns.
References
Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. doi:10.2307/2286061.
Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.
Examples
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
summary(model)