chooseKusingAUC {coca} | R Documentation |
Choose number of clusters based on AUC
Description
This function allows to choose the number of clusters in a dataset based on the area under the curve of the empirical distribution function of a consensus matrix, calculated for different (consecutive) cluster numbers, as explained in the article by Monti et al. (2003), Section 3.3.1.
Usage
chooseKusingAUC(areaUnderTheCurve, savePNG = FALSE, fileName = "deltaAUC.png")
Arguments
areaUnderTheCurve |
Vector of length maxK-1 containing the area under the curve of the empirical distribution function of the consensus matrices obtained with K varying from 2 to maxK. |
savePNG |
Boolean. If TRUE, a plot of the area under the curve for each value of K is saved as a png file. The file is saved in a subdirectory of the working directory, called "delta-auc". Default is FALSE. |
fileName |
If |
Value
This function returns a list containing:
deltaAUC |
a vector of length maxK-1 where element i is the area under the curve for K = i+1 minus the area under the curve for K = i (for i = 2 this is simply the area under the curve for K = i) |
K |
the lowest among the values of K that are chosen by the algorithm. |
Author(s)
Alessandra Cabassi alessandra.cabassi@mrc-bsu.cam.ac.uk
References
Monti, S., Tamayo, P., Mesirov, J. and Golub, T., 2003. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning, 52(1-2), pp.91-118.
Examples
# Assuming that we want to choose among any value of K (number of clusters)
# between 2 and 10 and that the area under the curve is as follows:
areaUnderTheCurve <- c(0.05, 0.15, 0.4, 0.5, 0.55, 0.56, 0.57, 0.58, 0.59)
# The optimal value of K can be chosen with:
K <- chooseKusingAUC(areaUnderTheCurve)$K