R: Consensus clustering using non-negative matrix factorization

fastnmf {clusterMI}

R Documentation

Consensus clustering using non-negative matrix factorization

Description

From a list of partitions fastnmf pools partition as proposed in Li and Ding (2007) <doi:10.1109/ICDM.2007.98>.

Usage

fastnmf(
  listpart,
  nb.clust,
  threshold = 10^(-5),
  printflag = TRUE,
  nstart = 100,
  early_stop_iter = 10,
  initializer = "random",
  batch_size = NULL,
  iter.max = 50
)

Arguments

`listpart`	a list of partitions
`nb.clust`	an integer specifying the number of clusters
`threshold`	a real specifying when the NMF algorithm is stoped. Default value is 10^(-5)
`printflag`	a boolean. If TRUE, nmf will print messages on console. Default value is TRUE
`nstart`	how many random sets should be chosen for kmeans initalization. Default value is 100
`early_stop_iter`	continue that many iterations after calculation of the best within-cluster-sum-of-squared-error. Default value is 10. See MiniBatchKmeans help page.
`initializer`	the method of initialization. One of, optimal_init, quantile_init, kmeans++ and random. See MiniBatchKmeans help page.
`batch_size`	the size of the mini batches for kmeans clustering. Default value is NULL.
`iter.max`	the maximum number of iterations allowed for kmeans. Default value is 50

Details

fastnmf performs consensus clustering using non-negative matrix factorization following Li and Ding (2007) <doi:10.1109/ICDM.2007.98>. The set of partitions that are aggregated needs to be given as a list where each element is a vector of numeric values. Note that the number of classes for each partition can vary. The number of classes for the consensus partition should be given using the nb.clust argument. The NMF algorithm is iterative and required an initial partition. This latter is based on kmeans clustering on the average of connectivity matrices. If batchsize is NULL, then kmeans clustering is performed using nstart initial values and iter.max iterations. Otherwise, Mini Batch Kmeans is used. This algorithm could be faster than kmeans if the number of invididuals is large.

Value

a list of 5 objets

`Htilde`	A fuzzy disjunctive table
`S`	A positive matrix
`Mtilde`	The average of connectivity matrices
`crit`	A vector with the optimized criterion at each iteration
`cluster`	the consensus partition in nb.clust classes

References

T. Li, C. Ding, and M. I. Jordan (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM'07, page 577-582, USA. IEEE Computer Society. <doi:10.1109/ICDM.2007.98>

Examples

data(wine)
require(clustrd)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 3 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#analysis using reduced kmeans

## apply the cluspca function on each imputed data set
res.ana.rkm <- lapply(res.imp$res.imp,
                      FUN = cluspca,
                      nclus = nb.clust,
                      ndim = 2,
                      method= "RKM")
## extract the set of partitions (under "list" format)
res.ana.rkm <-lapply(res.ana.rkm,"[[","cluster")

# pooling by NMF
res.pool.rkm <- fastnmf(res.ana.rkm, nb.clust = nb.clust)$clust

[Package clusterMI version 1.2.1 Index]