mice.impute.pmm {mice} | R Documentation |
Imputation by predictive mean matching
Description
Imputation by predictive mean matching
Usage
mice.impute.pmm(
y,
ry,
x,
wy = NULL,
donors = 5L,
matchtype = 1L,
exclude = -99999999,
ridge = 1e-05,
use.matcher = FALSE,
...
)
Arguments
y |
Vector to be imputed |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
donors |
The size of the donor pool among which a draw is made.
The default is |
matchtype |
Type of matching distance. The default choice
( |
exclude |
Value or vector of values to exclude from the imputation donor pool in |
ridge |
The ridge penalty used in |
use.matcher |
Logical. Set |
... |
Other named arguments. |
Details
Imputation of y
by predictive mean matching, based on
van Buuren (2012, p. 73). The procedure is as follows:
Calculate the cross-product matrix
S=X_{obs}'X_{obs}
.Calculate
V = (S+{diag}(S)\kappa)^{-1}
, with some small ridge parameter\kappa
.Calculate regression weights
\hat\beta = VX_{obs}'y_{obs}.
Draw
q
independentN(0,1)
variates in vector\dot z_1
.Calculate
V^{1/2}
by Cholesky decomposition.Calculate
\dot\beta = \hat\beta + \dot\sigma\dot z_1 V^{1/2}
.Calculate
\dot\eta(i,j)=|X_{{obs},[i]|}\hat\beta-X_{{mis},[j]}\dot\beta
withi=1,\dots,n_1
andj=1,\dots,n_0
.Construct
n_0
setsZ_j
, each containingd
candidate donors, from Y_obs such that\sum_d\dot\eta(i,j)
is minimum for allj=1,\dots,n_0
. Break ties randomly.Draw one donor
i_j
fromZ_j
randomly forj=1,\dots,n_0
.Calculate imputations
\dot y_j = y_{i_j}
forj=1,\dots,n_0
.
The name predictive mean matching was proposed by Little (1988).
Value
Vector with imputed data, same type as y
, and of length
sum(wy)
Author(s)
Gerko Vink, Stef van Buuren, Karin Groothuis-Oudshoorn
References
Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287–301.
Morris TP, White IR, Royston P (2015). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Med Res Methodol. ;14:75.
Van Buuren, S. (2018). Flexible Imputation of Missing Data. Second Edition. Chapman & Hall/CRC. Boca Raton, FL.
Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice
: Multivariate
Imputation by Chained Equations in R
. Journal of Statistical
Software, 45(3), 1-67. doi:10.18637/jss.v045.i03
See Also
Other univariate imputation functions:
mice.impute.cart()
,
mice.impute.lasso.logreg()
,
mice.impute.lasso.norm()
,
mice.impute.lasso.select.logreg()
,
mice.impute.lasso.select.norm()
,
mice.impute.lda()
,
mice.impute.logreg.boot()
,
mice.impute.logreg()
,
mice.impute.mean()
,
mice.impute.midastouch()
,
mice.impute.mnar.logreg()
,
mice.impute.mpmm()
,
mice.impute.norm.boot()
,
mice.impute.norm.nob()
,
mice.impute.norm.predict()
,
mice.impute.norm()
,
mice.impute.polr()
,
mice.impute.polyreg()
,
mice.impute.quadratic()
,
mice.impute.rf()
,
mice.impute.ri()
Examples
# We normally call mice.impute.pmm() from within mice()
# But we may call it directly as follows (not recommended)
set.seed(53177)
xname <- c("age", "hgt", "wgt")
r <- stats::complete.cases(boys[, xname])
x <- boys[r, xname]
y <- boys[r, "tv"]
ry <- !is.na(y)
table(ry)
# percentage of missing data in tv
sum(!ry) / length(ry)
# Impute missing tv data
yimp <- mice.impute.pmm(y, ry, x)
length(yimp)
hist(yimp, xlab = "Imputed missing tv")
# Impute all tv data
yimp <- mice.impute.pmm(y, ry, x, wy = rep(TRUE, length(y)))
length(yimp)
hist(yimp, xlab = "Imputed missing and observed tv")
plot(jitter(y), jitter(yimp),
main = "Predictive mean matching on age, height and weight",
xlab = "Observed tv (n = 224)",
ylab = "Imputed tv (n = 224)"
)
abline(0, 1)
cor(y, yimp, use = "pair")
# Use blots to exclude different values per column
# Create blots object
blots <- make.blots(boys)
# Exclude ml 1 through 5 from tv donor pool
blots$tv$exclude <- c(1:5)
# Exclude 100 random observed heights from tv donor pool
blots$hgt$exclude <- sample(unique(boys$hgt), 100)
imp <- mice(boys, method = "pmm", print = FALSE, blots = blots, seed=123)
blots$hgt$exclude %in% unlist(c(imp$imp$hgt)) # MUST be all FALSE
blots$tv$exclude %in% unlist(c(imp$imp$tv)) # MUST be all FALSE