| 04mi {mi} | R Documentation |
Multiple Imputation
Description
The mi function cannot be run in isolation. It is the most important step of a multi-step process to perform multiple imputation. The data must be specified as a missing_data.frame before mi is used to impute missing values for one or more missing_variables. An iterative algorithm is used where each missing_variable is modeled (using fit_model) as a function of all the other missing_variables and their missingness patterns. This documentation outlines the technical uses of the mi function. For a more general discussion of how to use mi for multiple imputation, see mi-package.
Usage
mi(y, model, ...)
## Hidden arguments:
## n.iter = 30, n.chains = 4, max.minutes = Inf, seed = NA, verbose = TRUE,
## save_models = FALSE, parallel = .Platform$OS.type != "windows"
Arguments
y |
Typically an object that inherits from the |
model |
Missing when |
... |
Further arguments, the most important of which are
|
Details
It is important to distinguish the two mi methods that are most relevant to users from the many mi methods that are less relevant. The primary mi method is that where y inherits from the missing_data.frame-class and model is omitted. This method “does” the imputation according to the additional arguments described under ... above and returns an object of class "mi". Executing two or more independent chains is important for monitoring the convergence
of each chain, see Rhats.
If the chains have not converged in the amount of iterations or time specified, the second important mi method is that where y is an object of class "mi" and model is omitted, which continues a previous run of the iterative imputation algorithm. All the arguments described under ... above remain applicable, except for n.chains and save_RAM because these are established by the previous run that is being continued.
The numerous remaining methods are of less importance to users. One mi method is called when y = "parallel" and model is omitted. This method merely sets up the parallel backend so that the chains can be executed in parallel on the local machine. We use the mclapply function in the parallel package to implement parallel processing on non-Windows machines, and we use the snow package to implement parallel processing on Windows machines; we refer users to the documentation for these packages for more detail about parallel processing. Parallel processing is used by default on machines with multiple processors, but sequential processing can be used instead by using the parallel=FALSE option. If the user is not using a mulitcore computer, sequential processing is used instead of parallel processing.
The first two mi methods described above in turn call a mi method where y inherits from the missing_data.frame-class and model is that which is returned by one of the fit_model-methods. The methods impute values for the originally missing values of a missing_variable given a fitted model, according to the imputation_method slot of the missing_variable in question. Advanced users could define new subclasses of the missing_variable-class in which case it may be necessary to write such a mi method for the new class. It will almost certainly be necessary to add to the
fit_model-methods. The existing mi and fit-model-methods should provide a template for doing so.
Value
If model is missing and n.chains is positive, then the mi method will return an object of
class "mi", which has the following slots:
- call
the call to
mi- data
a list of
missing_data.frames, one for each chain- total_iters
an integer vector that records how many iterations have been performed
There are a few methods for such an object, such as show, summary,
dimnames, nrow, ncol, etc.
If mi is called on a missing_data.frame with model missing and a nonpositive
n.chains, then the missing_data.frame will be returned after allocating storeage.
If model is not missing, then the mi method will impute missing values for the y
argument and return it.
Author(s)
Ben Goodrich and Jonathan Kropko, for this version, based on earlier versions written by Yu-Sung Su, Masanao Yajima, Maria Grazia Pittau, Jennifer Hill, and Andrew Gelman.
See Also
Examples
# STEP 0: Get data
data(CHAIN, package = "mi")
# STEP 1: Convert to a missing_data.frame
mdf <- missing_data.frame(CHAIN) # warnings about missingness patterns
show(mdf)
# STEP 2: change things
mdf <- change(mdf, y = "log_virus", what = "transformation", to = "identity")
# STEP 3: look deeper
summary(mdf)
# STEP 4: impute
## Not run:
imputations <- mi(mdf)
## End(Not run)