imputedata {clusterMI} | R Documentation |
Multiple imputation methods for cluster analysis
Description
imputedata
returns a list of imputed datasets by using imputation methods dedicated to individuals clustered in (unknown) groups
Usage
imputedata(
data.na,
method = "JM-GL",
nb.clust = NULL,
m = 20,
maxit = 50,
Lstart = 100,
L = 20,
method.mice = NULL,
predictmat = NULL,
verbose = TRUE,
seed = 1234,
bootstrap = FALSE
)
Arguments
data.na |
an incomplete dataframe |
method |
a single string specifying the imputation method used among "FCS-homo","FCS-hetero","JM-DP","JM-GL". By default method = "JM-GL". See the details section |
nb.clust |
number of clusters |
m |
number of imputed datasets. By default, m = 20. |
maxit |
number of iterations for FCS methods (only used for method = FCS-homo or method = FCS-hetero) |
Lstart |
number of iterations for the burn-in period (only used if method ="JM-DP" or "JM-GL") |
L |
number of skipped iterations to keep one imputed data set after the burn-in period (only used if method ="JM-DP" or "JM-GL") |
method.mice |
a vector of strings (or a single string) giving the imputation method for each variable (only used for method = FCS-homo or method = FCS-hetero). Default value is "pmm" (predictive mean matching) for FCS-homo and "mice.impute.2l.jomo" for FCS-hetero |
predictmat |
predictor matrix used for FCS imputation (only used for method = FCS-homo or method = FCS-hetero) |
verbose |
a boolean. If TRUE, a message is printed at each iteration. Use verbose = FALSE for silent imputation |
seed |
a positive integer initializing the random generator |
bootstrap |
a boolean. Use bootstrap = TRUE for proper imputation with FCS methods (Mclust sometimes fails with multiple points) |
Details
The imputedata
offers various multiple imputation methods dedicated to clustered individuals.
In particular, two fully conditional imputation methods are proposed (FCS-homo
and FCS-hetero
) which essentially differ by the assumption about the covariance in each cluster (constant or not respectively).
The imputation requires a pre-specified number of clusters (nb.clust
). See choosenbclust
if this number is unknown.
The imputedata
function alternates clustering and imputation given the partition of individuals.
When the clustering is performed, the function calls the mice
function from the mice
R package to perform imputation.
The mice
package proposes various methods for imputation which can be specified by tuning the method.mice
argument.
Note that two other joint modelling methods are also available: JM-GL
from the R package mix
and JM-DP
from the R package DPImputeCont
https://github.com/hang-j-kim/DPImputeCont
Value
a list of 3 objets
res.imp |
a list with the several imputed datasets |
res.conv |
for FCS methods, an array given the between (and within) inertia of each imputed variable at each iteration and for each imputed dataset. For JM methods, a matrix given the between inertia for each variable and each imputed dataset. |
call |
the matching call |
References
Kim, H. J., Reiter, J. P., Wang, Q., Cox, L. H. and Karr, A. F. (2014), Multiple imputation of missing or faulty values under linear constraints, Journal of Business and Economics Statistics, 32, 375-386 <doi:10.1080/07350015.2014.885435>
Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data. Chapman & Hall, Chapter 9.
Audigier, V., Niang, N., & Resche-Rigon, M. (2021). Clustering with missing data: which imputation model for which cluster analysis method?. arXiv preprint <arXiv:2106.04424>.
See Also
mice
choosenbclust
choosemaxit
varselbest
imp.mix
Examples
data(wine)
set.seed(123456)
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
nb.clust <- 3 # number of clusters
m <- 20 # number of imputed data sets
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)
lapply(res.imp$res.imp, summary)