mixImp {mlmi} | R Documentation |
Imputation for a mixture of continuous and categorical variables using the general location model.
Description
This function performs multiple imputation under a general location model
as described by Schafer (1997), using the mix
package. Imputation can
either be performed using posterior draws (pd=TRUE
) or conditonal on the maximum likelihood
estimate of the model parameters (pd=FALSE
), referred to as maximum likelihood
multiple imputation by von Hippel and Bartlett (2021).
Usage
mixImp(
obsData,
nCat,
M = 10,
pd = FALSE,
marginsType = 1,
margins = NULL,
designType = 1,
design = NULL,
steps = 100,
rseed
)
Arguments
obsData |
The data frame to be imputed. The categorical variables must be
in the first |
nCat |
The number of categorical variables in |
M |
Number of imputations to generate. |
pd |
Specify whether to use posterior draws ( |
marginsType |
An integer specifying what type of log-linear model to use for the
categorical variables. |
margins |
If |
designType |
An integer specifying how the continuous variables' means should depend
on the categorical variables. |
design |
If |
steps |
If |
rseed |
The value to set the |
Details
See the descriptions for marginsType
, margins
, designType
, design
and the documentation
in ecm.mix
for details about how to specify the model.
Imputed datasets can be analysed using withinBetween
,
scoreBased
, or for example the
bootImpute package.
Value
A list of imputed datasets, or if M=1
, just the imputed data frame.
References
Schafer J.L. (1997). Analysis of incomplete multivariate data. Chapman & Hall, Boca Raton, Florida, USA.
von Hippel P.T. and Bartlett J.W. Maximum likelihood multiple imputation: faster, more efficient imputation without posterior draws. Statistical Science 2021; 36(3) 400-420 doi:10.1214/20-STS793.
Examples
#simulate a partially observed dataset with a mixture of categorical and continuous variables
set.seed(1234)
n <- 100
#for simplicity we simulate completely independent categorical variables
x1 <- ceiling(3*runif(n))
x2 <- ceiling(2*runif(n))
x3 <- ceiling(2*runif(n))
y <- 1+0.5*(x1==2)+1.5*(x1==3)+x2+x3+rnorm(n)
temp <- data.frame(x1=x1,x2=x2,x3=x3,y=y)
#make some data missing in all variables
for (i in 1:4) {
temp[(runif(n)<0.25),i] <- NA
}
#impute conditional on MLE, assuming two-way associations in the log-linear model
#and main effects of categorical variables on continuous one (the default)
imps <- mixImp(temp, nCat=3, M=10, pd=FALSE, rseed=4423)