fastnmf {clusterMI}R Documentation

Consensus clustering using non-negative matrix factorization

Description

From a list of partitions fastnmf pools partition as proposed in Li and Ding (2007) <doi:10.1109/ICDM.2007.98>.

Usage

fastnmf(
  listpart,
  nb.clust,
  threshold = 10^(-5),
  printflag = TRUE,
  nstart = 100,
  early_stop_iter = 10,
  initializer = "random",
  batch_size = NULL,
  iter.max = 50
)

Arguments

listpart

a list of partitions

nb.clust

an integer specifying the number of clusters

threshold

a real specifying when the NMF algorithm is stoped. Default value is 10^(-5)

printflag

a boolean. If TRUE, nmf will print messages on console. Default value is TRUE

nstart

how many random sets should be chosen for kmeans initalization. Default value is 100

early_stop_iter

continue that many iterations after calculation of the best within-cluster-sum-of-squared-error. Default value is 10. See MiniBatchKmeans help page.

initializer

the method of initialization. One of, optimal_init, quantile_init, kmeans++ and random. See MiniBatchKmeans help page.

batch_size

the size of the mini batches for kmeans clustering. Default value is NULL.

iter.max

the maximum number of iterations allowed for kmeans. Default value is 50

Details

fastnmf performs consensus clustering using non-negative matrix factorization following Li and Ding (2007) <doi:10.1109/ICDM.2007.98>. The set of partitions that are aggregated needs to be given as a list where each element is a vector of numeric values. Note that the number of classes for each partition can vary. The number of classes for the consensus partition should be given using the nb.clust argument. The NMF algorithm is iterative and required an initial partition. This latter is based on kmeans clustering on the average of connectivity matrices. If batchsize is NULL, then kmeans clustering is performed using nstart initial values and iter.max iterations. Otherwise, Mini Batch Kmeans is used. This algorithm could be faster than kmeans if the number of invididuals is large.

Value

a list of 5 objets

Htilde

A fuzzy disjunctive table

S

A positive matrix

Mtilde

The average of connectivity matrices

crit

A vector with the optimized criterion at each iteration

cluster

the consensus partition in nb.clust classes

References

T. Li, C. Ding, and M. I. Jordan (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, ICDM'07, page 577-582, USA. IEEE Computer Society. <doi:10.1109/ICDM.2007.98>

See Also

kmeans MiniBatchKmeans

Examples

data(wine)
require(clustrd)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 3 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#analysis using reduced kmeans

## apply the cluspca function on each imputed data set
res.ana.rkm <- lapply(res.imp$res.imp,
                      FUN = cluspca,
                      nclus = nb.clust,
                      ndim = 2,
                      method= "RKM")
## extract the set of partitions (under "list" format)
res.ana.rkm <-lapply(res.ana.rkm,"[[","cluster")

# pooling by NMF
res.pool.rkm <- fastnmf(res.ana.rkm, nb.clust = nb.clust)$clust


[Package clusterMI version 1.2.1 Index]