choosem {clusterMI} | R Documentation |
Graphical investigation for the number of datasets generated by multiple imputation
Description
For an object generated by the function clusterMI
, the choosem
function browses the sequence of the contributory partitions and computes the consensus partition at each step. Then, the rand index between successive consensus partitions is plotted.
Usage
choosem(output, graph = TRUE, nnodes = 1)
Arguments
output |
an output from the clusterMI function |
graph |
a boolean indicating if a graphic is plotted |
nnodes |
number of CPU cores for parallel computing. By default |
Details
The number of imputed datasets (m
) should be sufficiently large to improve the partition accuracy.
The choosem
function can be used to check if this number is suitable.
This function computes the consensus partition by considering only the first imputed datasets.
By this way, a sequence of m
consensus partitions is obtained.
Then, the rand index between successive partitions is computed and reported in a graph.
The rand index measures the proximity between two partitions.
If the rand index between the last consensus partitions of the sequence reaches its maximum values (1),
then it means last imputed dataset does not modify the consensus partition.
Consequently, the number of imputed datasets can be considered as sufficiently large.
Value
A list of two objects
part |
|
rand |
a |
References
Audigier, V. and Niang, N., Clustering with missing data: which equivalent for Rubin's rules? Advances in Data Analysis and Classification <doi:10.1007/s11634-022-00519-1>, 2022.
See Also
Examples
data(wine)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
#imputation
m <- 5 # number of imputed data sets. Should be larger in practice
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)
#pooling
nnodes <- 2 # number of CPU cores for parallel computing
res.pool <- clusterMI(res.imp, instability = FALSE, nnodes = nnodes)
res.choosem <- choosem(res.pool)