choosem {clusterMI}R Documentation

Graphical investigation for the number of datasets generated by multiple imputation

Description

For an object generated by the function clusterMI, the choosem function browses the sequence of the contributory partitions and computes the consensus partition at each step. Then, the rand index between successive consensus partitions is plotted.

Usage

choosem(output, graph = TRUE, nnodes = 1)

Arguments

output

an output from the clusterMI function

graph

a boolean indicating if a graphic is plotted

nnodes

number of CPU cores for parallel computing. By default nnodes = 1.

Details

The number of imputed datasets (m) should be sufficiently large to improve the partition accuracy. The choosem function can be used to check if this number is suitable. This function computes the consensus partition by considering only the first imputed datasets. By this way, a sequence of m consensus partitions is obtained. Then, the rand index between successive partitions is computed and reported in a graph. The rand index measures the proximity between two partitions. If the rand index between the last consensus partitions of the sequence reaches its maximum values (1), then it means last imputed dataset does not modify the consensus partition. Consequently, the number of imputed datasets can be considered as sufficiently large.

Value

A list of two objects

part

m-columns matrix that contains in column p the consensus partition using only the p first imputed datasets

rand

a m-1 vector given the rand index between the m successive consensus partitions

References

Audigier, V. and Niang, N., Clustering with missing data: which equivalent for Rubin's rules? Advances in Data Analysis and Classification <doi:10.1007/s11634-022-00519-1>, 2022.

See Also

clusterMI, imputedata

Examples

data(wine)

set.seed(123456)
ref <- wine$cult
nb.clust <- 3
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
m <- 5 # number of imputed data sets. Should be larger in practice
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#pooling
nnodes <- 2 # number of CPU cores for parallel computing
res.pool <- clusterMI(res.imp, instability = FALSE, nnodes = nnodes)

res.choosem <- choosem(res.pool)

[Package clusterMI version 1.2.1 Index]