emlinkRS {fastLink} | R Documentation |
emlinkRS
Description
Calculates Felligi-Sunter weights and posterior zeta probabilities for matching patterns observed in a larger population that are not present in a sub-sample used to estimate the EM.
Usage
emlinkRS(patterns.out, em.out, nobs.a, nobs.b)
Arguments
patterns.out |
The output from 'tableCounts()' or 'emlinkMARmov()' (run on full dataset), containing all observed matching patterns in the full sample and the number of times that pattern is observed. |
em.out |
The output from 'emlinkMARmov()', an EM object estimated on a smaller random sample to apply to counts from a larger sample |
nobs.a |
Total number of observations in dataset A |
nobs.b |
Total number of observations in dataset B |
Value
emlinkMARmov
returns a list with the following components:
zeta.j |
The posterior match probabilities for each unique pattern. |
p.m |
The posterior probability of a pair matching. |
p.u |
The posterior probability of a pair not matching. |
p.gamma.k.m |
The posterior of the matching probability for a specific matching field. |
p.gamma.k.u |
The posterior of the non-matching probability for a specific matching field. |
p.gamma.j.m |
The posterior probability that a pair is in the matched set given a particular agreement pattern. |
p.gamma.j.u |
The posterior probability that a pair is in the unmatched set given a particular agreement pattern. |
patterns.w |
Counts of the agreement patterns observed, along with the Felligi-Sunter Weights. |
iter.converge |
The number of iterations it took the EM algorithm to converge. |
nobs.a |
The number of observations in dataset A. |
nobs.b |
The number of observations in dataset B. |
Author(s)
Ted Enamorado <ted.enamorado@gmail.com> and Ben Fifield <benfifield@gmail.com>
Examples
## Not run:
## -------------
## Run on subset
## -------------
dfA.s <- dfA[sample(1:nrow(dfA), 50),]; dfB.s <- dfB[sample(1:nrow(dfB), 50),]
## Calculate gammas
g1 <- gammaCKpar(dfA.s$firstname, dfB.s$firstname)
g2 <- gammaCKpar(dfA.s$middlename, dfB.s$middlename)
g3 <- gammaCKpar(dfA.s$lastname, dfB.s$lastname)
g4 <- gammaKpar(dfA.s$birthyear, dfB.s$birthyear)
## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s))
## Run EM
em <- emlinkMAR(tc, nobs.a = nrow(dfA.s), nobs.b = nrow(dfB.s))
## ------------------
## Apply to full data
## ------------------
## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)
## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))
em.full <- emlinkRS(tc, em, nrow(dfA), nrow(dfB)
## End(Not run)