R: Overimputation diagnostic plot

overimpute {clusterMI}

R Documentation

Overimputation diagnostic plot

Description

overimpute assesses the fit of the predictive distribution after performing multiple imputation with the imputedata function

Usage

overimpute(
  res.imputedata,
  plotvars = NULL,
  plotinds = NULL,
  nnodes = 2,
  path.outfile = NULL,
  alpha = 0.1,
  mfrow = NULL,
  mar = c(5, 4, 4, 2) - 1.9
)

Arguments

`res.imputedata`	an output from the imputedata function
`plotvars`	column index of the variables overimputed
`plotinds`	row index of the individuals overimputed
`nnodes`	an integer indicating the number of nodes for parallel calculation. Default value is 5
`path.outfile`	a vector of strings indicating the path for redirection of print messages. Default value is NULL, meaning that silent imputation is performed. Otherwise, print messages are saved in the files path.outfile/output.txt. One file per node is generated.
`alpha`	alpha level for prediction intervals
`mfrow`	a vector of the form c(nr, nc)
`mar`	a numerical vector of the form c(bottom, left, top, right)

Details

This function imputes each observed value from each conditional imputation model obtained from the imputedata function. The comparison between the "overimputed" values and the observed values is made by building a confidence interval for each observed value using the quantiles of the overimputed values (see Blackwell et al. (2015) <doi:10.1177/0049124115585360>). Note that confidence intervals built with quantiles require a large number of imputations. If the model fits well the data, then the 90% confidence interval should contain the observed value in 90% of the cases. The function overimpute takes as an input an output of the imputedata function (res.imputedata argument), the indices of the incomplete continuous variables that are plotted (plotvars), the indices of individuals (can be useful for time consuming imputation methods), the number of CPU cores for parallel computation, and the path for exporting print message generated during the parallel process (path.outfile).

Value

A list of two matrices

res.plot

7-columns matrix that contains (1) the variable which is overimputed, (2) the observed value of the observation, (3) the mean of the overimputations, (4) the lower bound of the confidence interval of the overimputations, (5) the upper bound of the confidence interval of the overimputations, (6) the proportion of the other variables that were missing for that observation in the original data, and (7) the color for graphical representation

res.values

a matrix with overimputed values for each cell. The number of columns corresponds to the number of values generated (i.e. the number of imputed datasets)

References

Blackwell, M., Honaker, J. and King. G. 2015. A Unified Approach to Measurement Error and Missing Data: Overview and Applications. Sociological Methods and Research, 1-39. <doi:10.1177/0049124115585360>

Examples

data(wine)

require(parallel)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

nnodes <- 2 # Number of CPU cores used for parallel computation

# Multiple imputation using m = 100 (should be larger in practice)

res.imp.over <- imputedata(data.na = wine.na,
                           nb.clust = nb.clust,
                           m = 100)
# Overimputation

## overimputed variable
plotvars <- "alco" 

## selection of 20 complete individuals on variable "alco"
plotinds <- sample(which(!is.na(wine.na[, plotvars])),
                    size = 20)
## overimputation                   
res.over <- overimpute(res.imp.over,
                       nnodes = nnodes,
                       plotvars = plotvars,
                       plotinds = plotinds,
                       )

[Package clusterMI version 1.2.1 Index]