pldamixture-package {pldamixture} | R Documentation |
Post-Linkage Data Analysis Based on Mixture Modelling
Description
pldamixture
implements the "General Framework for Regression with
Mismatched Data" developed by Slawski et al., 2023. The framework uses a
mixture model for pairs of linked records whose two components reflect
distributions conditional on match status, i.e., correct match or mismatch.
Inference is based on composite likelihood and the EM algorithm.
The package contains 4 functions for usage:
fit_mixture
print.fitmixture
summary.fitmixture
predict.fitmixture
Note
The references below discuss the implemented framework in more detail.
*Corresponding Author (mslawsk3@gmu.edu)
References
Slawski, M.*, West, B. T., Bukke, P., Diao, G., Wang, Z., & Ben-David, E. (2023).
A General Framework for Regression with Mismatched Data Based on Mixture Modeling.
Under Review. < doi:10.48550/arXiv.2306.00909 >
Bukke, P., Ben-David, E., Diao, G., Slawski, M.*, & West, B. T. (2023).
Cox Proportional Hazards Regression Using Linked Data: An Approach Based on Mixture Modelling.
Under Review.
Slawski, M.*, Diao, G., Ben-David, E. (2021). A pseudo-likelihood approach to linear regression with partially shuffled data. Journal of Computational and Graphical Statistics. 30(4), 991-1003 < doi:10.1080/10618600.2020.1870482 >
Examples
# optional inputs for linear regression of age at death on year of birth,
# using a cubic polynomial specification.
## use commonness of names as predictors of match status
## first and last names were used for linkage
mformula <- ~commf + comml
## hand-linked records are considered "safe" matches
safematches <- ifelse(lifem$hndlnk =="Hand-Linked At Some Level", TRUE, FALSE)
## overall mismatch rate in the data set is assumed to be ~ 0.05
mrate <- 0.05
fit <- fit_mixture(age_at_death ~ poly(unit_yob, 3, raw = TRUE), data = lifem,
family = "gaussian", mformula, safematches, mrate)
print(fit)
summary(fit)
predict(fit)