mice.par {micemd} | R Documentation |
Parallel calculations for Multivariate Imputation by Chained Equations
Description
Parallel calculations for Multivariate Imputation by Chained Equations using the R package parallel
.
Usage
mice.par(don.na, m = 5, method = NULL, predictorMatrix, where = NULL,
visitSequence = NULL, blots = NULL, post = NULL, blocks, formulas,
defaultMethod = c("pmm", "logreg", "polyreg", "polr"), maxit = 5,
seed = NA, data.init = NULL, nnodes = 5, path.outfile = NULL, ...)
Arguments
don.na |
A data frame or a matrix containing the incomplete data. Missing
values are coded as |
m |
Number of multiple imputations. The default is |
method |
Can be either a single string, or a vector of strings with
length |
predictorMatrix |
A square matrix of size |
where |
A data frame or matrix with logicals of the same dimensions
as |
visitSequence |
A vector of integers of arbitrary length, specifying the
column indices of the visiting sequence. The visiting sequence is the column
order that is used to impute the data during one pass through the data. A
column may be visited more than once. All incomplete columns that are used as
predictors should be visited, or else the function will stop with an error.
The default sequence |
blots |
A named |
post |
A vector of strings with length |
blocks |
List of vectors with variable names per block. List elements
may be named to identify blocks. Variables within a block are
imputed by a multivariate imputation method
(see |
formulas |
A named list of formula's, or expressions that
can be converted into formula's by |
defaultMethod |
A vector of three strings containing the default
imputation methods for numerical columns, factor columns with 2 levels, and
columns with (unordered or ordered) factors with more than two levels,
respectively. If nothing is specified, the following defaults will be used:
|
maxit |
A scalar giving the number of iterations. The default is 5. |
seed |
An integer that is used as argument by the |
data.init |
A data frame of the same size and type as |
nnodes |
A scalar indicating the number of nodes for parallel calculation. Default value is 5. |
path.outfile |
A vector of strings indicating the path for redirection of print messages. Default value is NULL, meaning that silent imputation is performed. Otherwise, print messages are saved in the files path.outfile/output.txt. One file per node is generated. |
... |
Named arguments that are passed down to the elementary imputation functions. |
Details
Performs multiple imputation of m
tables in parallel by generating m
seeds, and then by performing multiple imputation by chained equations in parallel from each one. The output is the same as the mice
function of the mice package.
Value
Returns an S3 object of class mids
(multiply imputed data set)
Author(s)
Vincent Audigier vincent.audigier@cnam.fr
References
Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice
:
Multivariate Imputation by Chained Equations in R
. Journal of
Statistical Software, 45(3), 1-67.
https://www.jstatsoft.org/article/view/v045i03 <doi:10.18637/jss.v045.i03>
van Buuren, S. (2012). Flexible Imputation of Missing Data. Boca Raton, FL: Chapman & Hall/CRC Press.
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 12, 1049–1064. <doi:10.1080/10629360600810434>
Van Buuren, S. (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research, 16, 3, 219–242. <doi:10.1177/0962280206074463>
Van Buuren, S., Boshuizen, H.C., Knook, D.L. (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681–694. <doi:10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R>
Brand, J.P.L. (1999) Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets. Dissertation. Rotterdam: Erasmus University.
See Also
Examples
##############
# nhanes (one level data)
##############
data(nhanes, package = "mice")
#imp <- mice.par(nhanes)
#fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
#summary(pool(fit))
##############
#CHEM97Na (Two levels data with 1681 observations and 5 variables)
##############
data(CHEM97Na)
ind.clust<-1#index for the cluster variable
#initialisation of the argument predictorMatrix
predictor.matrix<-mice(CHEM97Na,m=1,maxit=0)$pred
predictor.matrix[ind.clust,ind.clust]<-0
predictor.matrix[-ind.clust,ind.clust]<- -2
predictor.matrix[predictor.matrix==1]<-2
#initialisation of the argument method
method<-find.defaultMethod(CHEM97Na,ind.clust)
#multiple imputation by chained equations (parallel calculation) [1 minute]
#(the imputation process can be followed by opening output.txt files in the working directory)
#res.mice<-mice.par(CHEM97Na,
# predictorMatrix = predictor.matrix,
# method=method,
# path.outfile=getwd())
#multiple imputation by chained equations (without parallel calculation) [4.8 minutes]
#res.mice<-mice(CHEM97Na,
# predictorMatrix = predictor.matrix,
# method=method)
############
#IPDNa (Two levels data with 11685 observations and 10 variables)
############
data(IPDNa)
ind.clust<-1#index for the cluster variable
#initialisation of the argument predictorMatrix
predictor.matrix<-mice(IPDNa,m=1,maxit=0)$pred
predictor.matrix[ind.clust,ind.clust]<-0
predictor.matrix[-ind.clust,ind.clust]<- -2
predictor.matrix[predictor.matrix==1]<-2
#initialisation of the argument method
method<-find.defaultMethod(IPDNa,ind.clust)
#multiple imputation by chained equations (parallel calculation)
#res.mice<-mice.par(IPDNa,
# predictorMatrix = predictor.matrix,
# method=method,
# path.outfile=getwd())