panImpute {mitml} | R Documentation |
Impute multilevel missing data using pan
Description
Performs multiple imputation of multilevel data using the pan
package (Schafer & Yucel, 2002).
Supports imputation of continuous multilevel data with missing values at level 1.
See 'Details' for further information.
Usage
panImpute(data, type, formula, n.burn = 5000, n.iter = 100, m = 10, group = NULL,
prior = NULL, seed = NULL, save.pred = FALSE, keep.chains = c("full", "diagonal"),
silent = FALSE)
Arguments
data |
A data frame containing the incomplete data, the auxiliary variables, the cluster indicator variable, and any other variables that should be included in the imputed datasets. |
type |
An integer vector specifying the role of each variable in the imputation model (see 'Details'). |
formula |
A formula specifying the role of each variable in the imputation model. The basic model is constructed by |
n.burn |
The number of burn-in iterations before any imputations are drawn. Default is 5,000. |
n.iter |
The number of iterations between imputations. Default is 100. |
m |
The number of imputed data sets to generate. |
group |
(optional) A character string denoting the name of an additional grouping variable to be used with the |
prior |
(optional) A list with components |
seed |
(optional) An integer value initializing |
save.pred |
(optional) Logical flag indicating if variables derived using |
keep.chains |
(optional) A character string denoting which chains of the MCMC algorithm to store. Can be |
silent |
(optional) Logical flag indicating if console output should be suppressed. Default is to |
Details
This function serves as an interface to the pan
package and supports imputation of continuous multilevel data at level 1 (Schafer & Yucel, 2002).
The imputation model can be specified using either the type
or the formula
argument.
The type
interface is designed to provide quick-and-easy imputations using pan
.
The type
argument must be an integer vector denoting the role of each variable in the imputation model:
1
: target variables containing missing data2
: predictors with fixed effect on all targets (completely observed)3
: predictors with random effect on all targets (completely observed)-1
: grouping variable within which the imputation is run separately-2
: cluster indicator variable0
: variables not featured in the model
At least one target variable and the cluster indicator must be specified.
The intercept is automatically included as both a fixed and a random effect.
If a variable of type -1
is found, then separate imputations are performed within each level of that variable.
The formula
argument is intended as a more flexible and feature-rich interface to pan
.
Specifying the formula
argument is similar to specifying other formulae in R.
Given below is a list of operators that panImpute
understands:
~
: separates the target (left-hand) and predictor (right-hand) side of the model+
: adds target or predictor variables to the model*
: adds an interaction term of two or more predictors|
: denotes cluster-specific random effects and specifies the cluster indicator (e.g.,1|ID
)I()
: defines functions to be interpreted bymodel.matrix
Predictors are allowed to have fixed effects, random effects, or both on all target variables.
The intercept is automatically included as both a fixed and a random effect, but it can be suppressed if needed (see 'Examples').
Note that, when specifying random effects other than the intercept, these will not be automatically added as fixed effects and must be included explicitly.
Any predictors defined by I()
will be used for imputation but not included in the data set unless save.pred = TRUE
.
In order to run separate imputations for each level of an additional grouping variable, the group
argument can be used.
The name of the grouping variable must be given as a character string (i.e., in quotation marks).
The default prior distributions for the covariance matrices in panImpute
are "least informative" inverse-Wishart priors with minimum positive degrees of freedom (largest dispersion) and the identity matrix for scale.
The prior
argument can be used to specify alternative prior distributions.
These must be supplied as a list containing the following components:
a
: degrees of freedom for the covariance matrix of residualsBinv
: scale matrix for the covariance matrix of residualsc
: degrees of freedom for the covariance matrix of random effectsDinv
: scale matrix for the covariance matrix of random effects
A sensible choice for a diffuse non-default prior is to set the degrees of freedom to the lowest value possible, and the scale matrices according to a prior guess of the corresponding covariance matrices (see Schafer & Yucel, 2002).
In imputation models with many parameters, the number of chains in the MCMC algorithm being stored can be reduced with the keep.chains
argument.
If set to "full"
(the default), all chains are saved.
If set to "diagonal"
, only chains pertaining to fixed effects and the diagonal entries of the covariance matrices are saved.
This setting influences the storage mode of parameters (e.g., dimensions and indices of arrays) and should be used with caution.
Value
An object of class mitml
, containing the following components:
data |
The original (incomplete) data set, sorted according to the cluster variable and (if given) the grouping variable, with several attributes describing the original row order ( |
replacement.mat |
A matrix containing the multiple replacements (i.e., imputations) for each missing value. The replacement matrix contains one row for each missing value and one one column for each imputed data set. |
index.mat |
A matrix containing the row and column index for each missing value. The index matrix is used to link the missing values in the data set with their corresponding rows in the replacement matrix. |
call |
The matched function call. |
model |
A list containing the names of the cluster variable, the target variables, and the predictor variables with fixed and random effects, respectively. |
random.L1 |
A character string denoting the handling of random residual covariance matrices (not used here; see |
prior |
The prior parameters used in the imputation model. |
iter |
A list containing the number of burn-in iterations, the number of iterations between imputations, and the number of imputed data sets. |
par.burnin |
A multi-dimensional array containing the parameters of the imputation model from the burn-in phase. |
par.imputation |
A multi-dimensional array containing the parameters of the imputation model from the imputation phase. |
Note
For objects of class mitml
, methods for the generic functions print
, summary
, and plot
are available to inspect the fitted imputation model.
mitmlComplete
is used for extracting the imputed data sets.
Author(s)
Simon Grund, Alexander Robitzsch, Oliver Luedtke
References
Schafer, J. L., and Yucel, R. M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11, 437-457.
See Also
jomoImpute
, mitmlComplete
, summary.mitml
, plot.mitml
Examples
# NOTE: The number of iterations in these examples is much lower than it
# should be! This is done in order to comply with CRAN policies, and more
# iterations are recommended for applications in practice!
data(studentratings)
# *** ................................
# the 'type' interface
#
# * Example 1.1: 'ReadDis' and 'SES', predicted by 'ReadAchiev' and
# 'CognAbility', with random slope for 'ReadAchiev'
type <- c(-2, 0, 0, 0, 0, 0, 3, 1, 2, 0)
names(type) <- colnames(studentratings)
type
imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5)
# * Example 1.2: 'ReadDis' and 'SES' groupwise for 'FedState',
# and predicted by 'ReadAchiev'
type <- c(-2, -1, 0, 0, 0, 0, 2, 1, 0, 0)
names(type) <- colnames(studentratings)
type
imp <- panImpute(studentratings, type = type, n.burn = 1000, n.iter = 100, m = 5)
# *** ................................
# the 'formula' interface
#
# * Example 2.1: imputation of 'ReadDis', predicted by 'ReadAchiev'
# (random intercept)
fml <- ReadDis ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
# ... the intercept can be suppressed using '0' or '-1' (here for fixed intercept)
fml <- ReadDis ~ 0 + ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
# * Example 2.2: imputation of 'ReadDis', predicted by 'ReadAchiev'
# (random slope)
fml <- ReadDis ~ ReadAchiev + (1+ReadAchiev|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
# * Example 2.3: imputation of 'ReadDis', predicted by 'ReadAchiev',
# groupwise for 'FedState'
fml <- ReadDis ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula = fml, group = "FedState", n.burn = 1000,
n.iter = 100, m = 5)
# * Example 2.4: imputation of 'ReadDis', predicted by 'ReadAchiev'
# including the cluster mean of 'ReadAchiev' as an additional predictor
fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5)
# ... using 'save.pred' to save the calculated cluster means in the data set
fml <- ReadDis ~ ReadAchiev + I(clusterMeans(ReadAchiev, ID)) + (1|ID)
imp <- panImpute(studentratings, formula = fml, n.burn = 1000, n.iter = 100, m = 5,
save.pred = TRUE)
head(mitmlComplete(imp, print = 1))