problink_em {reclin2} | R Documentation |
Calculate EM-estimates of m- and u-probabilities
Description
Calculate EM-estimates of m- and u-probabilities
Usage
problink_em(
formula,
data,
patterns,
mprobs0 = list(0.95),
uprobs0 = list(0.02),
p0 = 0.05,
tol = 1e-05,
mprob_max = 0.999,
uprob_min = 1e-04
)
Arguments
formula |
a formula object with the variables for which to calculate the
m- and u-probabilities. Should be of the form |
data |
data set with pairs on which to estimate the model. Alternatively
one can use the |
patterns |
table of patterns (as output by
|
mprobs0 , uprobs0 |
initial values of the m- and u-probabilities. These
should be lists with numeric values. The names of the elements in the list
should correspond to the names in |
p0 |
the initial estimate of the probability that a pair is a match. |
tol |
when the change in the m and u-probabilities is smaller than |
mprob_max |
maximum values of the estimated m-probabilities. Values equal to one can lead to numerical instabilities. |
uprob_min |
maximum values of the estimated m-probabilities. Values equal to zero can lead to numerical instabilities. |
Value
Returns an object of type problink_em
. This is a list containing the
estimated mprobs
, uprobs
and overall linkage probability
p
. It also contains the table of comparison patterns
.
References
Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. doi:10.2307/2286061.
Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.
Examples
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(~ lastname + firstname + address + sex, data = pairs)
summary(model)