R: Multiple imputation methods for cluster analysis

imputedata {clusterMI}

R Documentation

Multiple imputation methods for cluster analysis

Description

imputedata returns a list of imputed datasets by using imputation methods dedicated to individuals clustered in (unknown) groups

Usage

imputedata(
  data.na,
  method = "JM-GL",
  nb.clust = NULL,
  m = 20,
  maxit = 50,
  Lstart = 100,
  L = 20,
  method.mice = NULL,
  predictmat = NULL,
  verbose = TRUE,
  seed = 1234,
  bootstrap = FALSE
)

Arguments

`data.na`	an incomplete dataframe
`method`	a single string specifying the imputation method used among "FCS-homo","FCS-hetero","JM-DP","JM-GL". By default method = "JM-GL". See the details section
`nb.clust`	number of clusters
`m`	number of imputed datasets. By default, m = 20.
`maxit`	number of iterations for FCS methods (only used for method = FCS-homo or method = FCS-hetero)
`Lstart`	number of iterations for the burn-in period (only used if method ="JM-DP" or "JM-GL")
`L`	number of skipped iterations to keep one imputed data set after the burn-in period (only used if method ="JM-DP" or "JM-GL")
`method.mice`	a vector of strings (or a single string) giving the imputation method for each variable (only used for method = FCS-homo or method = FCS-hetero). Default value is "pmm" (predictive mean matching) for FCS-homo and "mice.impute.2l.jomo" for FCS-hetero
`predictmat`	predictor matrix used for FCS imputation (only used for method = FCS-homo or method = FCS-hetero)
`verbose`	a boolean. If TRUE, a message is printed at each iteration. Use verbose = FALSE for silent imputation
`seed`	a positive integer initializing the random generator
`bootstrap`	a boolean. Use bootstrap = TRUE for proper imputation with FCS methods (Mclust sometimes fails with multiple points)

Details

The imputedata offers various multiple imputation methods dedicated to clustered individuals. In particular, two fully conditional imputation methods are proposed (FCS-homo and FCS-hetero) which essentially differ by the assumption about the covariance in each cluster (constant or not respectively). The imputation requires a pre-specified number of clusters (nb.clust). See choosenbclust if this number is unknown. The imputedata function alternates clustering and imputation given the partition of individuals. When the clustering is performed, the function calls the mice function from the mice R package to perform imputation. The mice package proposes various methods for imputation which can be specified by tuning the method.mice argument. Note that two other joint modelling methods are also available: JM-GL from the R package mix and JM-DP from the R package DPImputeCont https://github.com/hang-j-kim/DPImputeCont

Value

a list of 3 objets

`res.imp`	a list with the several imputed datasets
`res.conv`	for FCS methods, an array given the between (and within) inertia of each imputed variable at each iteration and for each imputed dataset. For JM methods, a matrix given the between inertia for each variable and each imputed dataset.
`call`	the matching call

References

Kim, H. J., Reiter, J. P., Wang, Q., Cox, L. H. and Karr, A. F. (2014), Multiple imputation of missing or faulty values under linear constraints, Journal of Business and Economics Statistics, 32, 375-386 <doi:10.1080/07350015.2014.885435>

Schafer, J. L. (1997) Analysis of Incomplete Multivariate Data. Chapman & Hall, Chapter 9.

Audigier, V., Niang, N., & Resche-Rigon, M. (2021). Clustering with missing data: which imputation model for which cluster analysis method?. arXiv preprint <arXiv:2106.04424>.

Examples

data(wine)
set.seed(123456)
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
nb.clust <- 3 # number of clusters
m <- 20 # number of imputed data sets
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)
lapply(res.imp$res.imp, summary)

[Package clusterMI version 1.2.1 Index]