overimpute {clusterMI} | R Documentation |
Overimputation diagnostic plot
Description
overimpute
assesses the fit of the predictive distribution after performing multiple imputation with the imputedata
function
Usage
overimpute(
res.imputedata,
plotvars = NULL,
plotinds = NULL,
nnodes = 2,
path.outfile = NULL,
alpha = 0.1,
mfrow = NULL,
mar = c(5, 4, 4, 2) - 1.9
)
Arguments
res.imputedata |
an output from the imputedata function |
plotvars |
column index of the variables overimputed |
plotinds |
row index of the individuals overimputed |
nnodes |
an integer indicating the number of nodes for parallel calculation. Default value is 5 |
path.outfile |
a vector of strings indicating the path for redirection of print messages. Default value is NULL, meaning that silent imputation is performed. Otherwise, print messages are saved in the files path.outfile/output.txt. One file per node is generated. |
alpha |
alpha level for prediction intervals |
mfrow |
a vector of the form c(nr, nc) |
mar |
a numerical vector of the form c(bottom, left, top, right) |
Details
This function imputes each observed value from each conditional imputation model obtained from the imputedata function. The comparison between the "overimputed" values and the observed values is made by building a confidence interval for each observed value using the quantiles of the overimputed values (see Blackwell et al. (2015) <doi:10.1177/0049124115585360>). Note that confidence intervals built with quantiles require a large number of imputations. If the model fits well the data, then the 90% confidence interval should contain the observed value in 90% of the cases. The function overimpute takes as an input an output of the imputedata
function (res.imputedata
argument), the indices of the incomplete continuous variables that are plotted (plotvars
), the indices of individuals (can be useful for time consuming imputation methods), the number of CPU cores for parallel computation, and the path for exporting print message generated during the parallel process (path.outfile
).
Value
A list of two matrices
res.plot |
7-columns matrix that contains (1) the variable which is overimputed, (2) the observed value of the observation, (3) the mean of the overimputations, (4) the lower bound of the confidence interval of the overimputations, (5) the upper bound of the confidence interval of the overimputations, (6) the proportion of the other variables that were missing for that observation in the original data, and (7) the color for graphical representation |
res.values |
a matrix with overimputed values for each cell. The number of columns corresponds to the number of values generated (i.e. the number of imputed datasets) |
References
Blackwell, M., Honaker, J. and King. G. 2015. A Unified Approach to Measurement Error and Missing Data: Overview and Applications. Sociological Methods and Research, 1-39. <doi:10.1177/0049124115585360>
Examples
data(wine)
require(parallel)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
nnodes <- 2 # Number of CPU cores used for parallel computation
# Multiple imputation using m = 100 (should be larger in practice)
res.imp.over <- imputedata(data.na = wine.na,
nb.clust = nb.clust,
m = 100)
# Overimputation
## overimputed variable
plotvars <- "alco"
## selection of 20 complete individuals on variable "alco"
plotinds <- sample(which(!is.na(wine.na[, plotvars])),
size = 20)
## overimputation
res.over <- overimpute(res.imp.over,
nnodes = nnodes,
plotvars = plotvars,
plotinds = plotinds,
)