R: Cluster-Of-Clusters Analysis

coca {coca}

R Documentation

Cluster-Of-Clusters Analysis

Description

This function allows to do Cluster-Of-Clusters Analysis on a binary matrix where each column is a clustering of the data, each row corresponds to a data point and the element in position (i,j) is equal to 1 if data point i belongs to cluster j, 0 otherwise.

Usage

coca(
  moc,
  K = NULL,
  maxK = 6,
  B = 1000,
  pItem = 0.8,
  hclustMethod = "average",
  choiceKmethod = "silhouette",
  ccClMethod = "kmeans",
  ccDistHC = "euclidean",
  maxIterKM = 1000,
  savePNG = FALSE,
  fileName = "coca",
  verbose = FALSE,
  widestGap = FALSE,
  dunns = FALSE,
  dunn2s = FALSE,
  returnAllMatrices = FALSE
)

Arguments

`moc`	N X C data matrix, where C is the total number of clusters considered.
`K`	Number of clusters.
`maxK`	Maximum number of clusters considered for the final clustering if K is not known. Default is 6.
`B`	Number of iterations of the Consensus Clustering step.
`pItem`	Proportion of items sampled at each iteration of the Consensus Cluster step.
`hclustMethod`	Agglomeration method to be used by the hclust function to perform hierarchical clustering on the consensus matrix. Can be "single", "complete", "average", etc. For more details please see ?stats::hclust.
`choiceKmethod`	Method used to choose the number of clusters if K is NULL, can be either "AUC" (area under the curve, work in progress) or "silhouette". Default is "silhouette".
`ccClMethod`	Clustering method to be used by the Consensus Clustering algorithm (CC). Can be either "kmeans" for k-means clustering or "hclust" for hiearchical clustering. Default is "kmeans".
`ccDistHC`	Distance to be used by the hiearchical clustering algorithm inside CC. Can be "pearson" (for 1 - Pearson correlation), "spearman" (for 1- Spearman correlation), or any of the distances provided in stats::dist() (i.e. "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"). Default is "euclidean".
`maxIterKM`	Number of iterations for the k-means clustering algorithm. Default is 1000.
`savePNG`	Boolean. Save plots as PNG files. Default is FALSE.
`fileName`	If `savePNG` is TRUE, this is the string containing (the first part of) the name of the output files. Can be used to specify the folder path too. Default is "coca". The ".png" extension is automatically added to this string.
`verbose`	Boolean.
`widestGap`	Boolean. If TRUE, compute also widest gap index to choose best number of clusters. Default is FALSE.
`dunns`	Boolean. If TRUE, compute also Dunn's index to choose best number of clusters. Default is FALSE.
`dunn2s`	Boolean. If TRUE, compute also alternative Dunn's index to choose best number of clusters. Default is FALSE.
`returnAllMatrices`	Boolean. If TRUE, return consensus matrices for all considered values of K. Default is FALSE.

Value

This function returns a list containing:

`consensusMatrix`	a symmetric matrix where the element in position (i,j) corresponds to the proportion of times that items i and j have been clustered together and a vector of cluster labels.
`clusterLabels`	the final cluster labels.
`K`	the final number of clusters. If provided by the user, this is the same as the input. Otherwise, this is the number of clusters selected via the requested method (see argument `choiceKmethod`).
`consensusMatrices`	if returnAllMatrices = TRUE, this array also returned, containing the consensus matrices obtained for each of the numbers of clusters considered by the algorithm.

Author(s)

Alessandra Cabassi alessandra.cabassi@mrc-bsu.cam.ac.uk

References

The Cancer Genome Atlas, 2012. Comprehensive molecular portraits of human breast tumours. Nature, 487(7407), pp.61–70.

Cabassi, A. and Kirk, P. D. W. (2019). Multiple kernel learning for integrative consensus clustering of 'omic datasets. arXiv preprint. arXiv:1904.07701.

Examples

# Load data
data <- list()
data[[1]] <- as.matrix(read.csv(system.file("extdata", "dataset1.csv",
package = "coca"), row.names = 1))
data[[2]] <- as.matrix(read.csv(system.file("extdata", "dataset2.csv",
package = "coca"), row.names = 1))
data[[3]] <- as.matrix(read.csv(system.file("extdata", "dataset3.csv",
package = "coca"), row.names = 1))

# Build matrix of clusters
outputBuildMOC <- buildMOC(data, M = 3, K = 5, distances = "cor")

# Extract matrix of clusters
moc <- outputBuildMOC$moc

# Do Cluster-Of-Clusters Analysis
outputCOCA <- coca(moc, K = 5)

# Extract cluster labels
clusterLabels <- outputCOCA$clusterLabels

[Package coca version 1.1.0 Index]