| mixImp {mlmi} | R Documentation |
Imputation for a mixture of continuous and categorical variables using the general location model.
Description
This function performs multiple imputation under a general location model
as described by Schafer (1997), using the mix package. Imputation can
either be performed using posterior draws (pd=TRUE) or conditonal on the maximum likelihood
estimate of the model parameters (pd=FALSE), referred to as maximum likelihood
multiple imputation by von Hippel and Bartlett (2021).
Usage
mixImp(
obsData,
nCat,
M = 10,
pd = FALSE,
marginsType = 1,
margins = NULL,
designType = 1,
design = NULL,
steps = 100,
rseed
)
Arguments
obsData |
The data frame to be imputed. The categorical variables must be
in the first |
nCat |
The number of categorical variables in |
M |
Number of imputations to generate. |
pd |
Specify whether to use posterior draws ( |
marginsType |
An integer specifying what type of log-linear model to use for the
categorical variables. |
margins |
If |
designType |
An integer specifying how the continuous variables' means should depend
on the categorical variables. |
design |
If |
steps |
If |
rseed |
The value to set the |
Details
See the descriptions for marginsType, margins, designType, design and the documentation
in ecm.mix for details about how to specify the model.
Imputed datasets can be analysed using withinBetween,
scoreBased, or for example the
bootImpute package.
Value
A list of imputed datasets, or if M=1, just the imputed data frame.
References
Schafer J.L. (1997). Analysis of incomplete multivariate data. Chapman & Hall, Boca Raton, Florida, USA.
von Hippel P.T. and Bartlett J.W. Maximum likelihood multiple imputation: faster, more efficient imputation without posterior draws. Statistical Science 2021; 36(3) 400-420 doi:10.1214/20-STS793.
Examples
#simulate a partially observed dataset with a mixture of categorical and continuous variables
set.seed(1234)
n <- 100
#for simplicity we simulate completely independent categorical variables
x1 <- ceiling(3*runif(n))
x2 <- ceiling(2*runif(n))
x3 <- ceiling(2*runif(n))
y <- 1+0.5*(x1==2)+1.5*(x1==3)+x2+x3+rnorm(n)
temp <- data.frame(x1=x1,x2=x2,x3=x3,y=y)
#make some data missing in all variables
for (i in 1:4) {
temp[(runif(n)<0.25),i] <- NA
}
#impute conditional on MLE, assuming two-way associations in the log-linear model
#and main effects of categorical variables on continuous one (the default)
imps <- mixImp(temp, nCat=3, M=10, pd=FALSE, rseed=4423)