mice.impute.2l.2stage.pmm {micemd} | R Documentation |
Predictive mean matching imputation for two-level variable
Description
Similarly to mice.impute.2l.stage.norm, this function imputes univariate two-level continuous variable from a heteroscedastic normal model. The difference consists in replacing missing values by observed values instead of adding a parametric noise to the prediction of a linear model with random effects (as done in mice.impute.2l.stage.norm.mm and mice.impute.2l.stage.norm.reml).
Usage
mice.impute.2l.2stage.pmm(y, ry, x, type,
method_est = "mm",
incluster = FALSE,
kpmm = 5, ...)
Arguments
y |
Incomplete data vector of length |
ry |
Vector of missing data pattern |
x |
Matrix |
type |
Vector of length |
method_est |
Vector of string given the version of the estimator to used. Choose |
incluster |
Boolean indicating if the imputed values are drawn from the cluster or from the full dataset. By default imputed values are drawn from all available clusters |
kpmm |
The size of the donor pool among which a draw is made. The default is |
... |
Other named arguments. |
Details
Imputes univariate two-level continuous variable from observed values. The imputation method is based on a two-stage estimator: at step 1, a linear regression model is fitted to each observed cluster; at step 2, estimates obtained from each cluster are combined according to a linear random effect model. To combine estimates at stage 2, parameters of the linear random effect model are estimated according to the method of moments or according to the restricted maximum likelihood estimator. The variability on the parameters of the imputation is propagated according to an asymptotic strategy requiring a large number of clusters. The sample variability is reflected by using a predictive mean matching approach, meaning that missing values are imputed by a draw from observed values. The pool of k
donors is defined according to the Manhattan distance between the prediction of the observation which is imputed and the predictions of other available observations (matching of type 2). The pool can be restricted to the cluster of the individual that is imputed or from all clusters. By drawing values inside the cluster, the heteroscedasticity assumption is preserved. Otherwise, the sample variability of imputed values is the same for all clusters, which strengthen the homoscedasticity assumption. Among the pool of k
donors, the selected one is drawn at random.
Value
Numeric vector of length sum(!ry)
with imputations
Note
This method is experimental.
Author(s)
Vincent Audigier vincent.audigier@cnam.fr
References
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Resche-Rigon, M. and White, I. R. (2016). Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Statistical Methods in Medical Research. To appear. <doi:10.1177/0962280216666564>
Audigier, V., White, I. , Jolani ,S. Debray, T., Quartagno, M., Carpenter, J., van Buuren, S. and Resche-Rigon, M. Multiple imputation for multilevel data with continuous and binary variables (2018). Statistical Science. <doi:10.1214/18-STS646>.