cluscomp {clusterCons}R Documentation

Perform consensus clustering with the option of using multiple algorithms and parameters and merging

Description

Calculates an NxN consensus matrix for each clustering experiment performed where each entry has a value between 0 (never observed) and 1 (always observed)
When running with more than one algorithm or with the same algorithm and multiple conditions a consensus matrix will be generated for each. These can optionally be merged into a mergematrix by cluster number by setting merge=1.

Usage

cluscomp(
  x,
  diss=FALSE,
  algorithms = list("kmeans"),
  alparams = list(),
  alweights = list(),
  clmin = 2,
  clmax = 10,
  prop = 0.8,
  reps = 50,
  merge = 0
  )

Arguments

x

data.frame of numerical data with conditions as the column names and unique ids as the row names. All variables must be numeric. Missing values(NAs) are not allowed. Optionally you can pass a distance matrix directly, in which case you must ensure that the distance matrix is a data.frame and that the row and column names match each other (as the distance matrix is a pair-wise distance calculation).

diss

set to TRUE if you are providing a distance matrix, default is FALSE

algorithms

list of algorithm names which can be drawn from 'agnes','diana','pam','kmeans' or 'hclust'. The user can also write a simple wrapper for any other clustering method (see details)

alparams

list of algorithm paramter lists using the same specification as for the individual algorithm called (see details)

alweights

list of integer weights for each algorithm (only used when merging consensus results between algorithms)

clmin

integer for the smallest cluster number to consider

clmax

integer for the largest cluster number to consider

prop

numeric for the proportion of rows to sample during the process. Must be between 0 and 1

reps

integer for the number of iterations to perform per clustering

merge

an integer indicating whether you also want the merged matrices (1) or just the consensus ones (0), accepts only 1 or 0.

Details

cluscomp is an implementation of a consensus clustering methodology first proposed by Monti et al. (2003) in which the connectivity between any two members of a data matrix is tested by resampling statistics. The principle is that by only sampling a random proportion of rows in the data matrix and performing many clustering experiments we can capture information about the robustness of the clusters identified by the full unsampled clustering result.

For each re-sampling experiment run a zero square matrix is created with identical rows and columns matching the unique ids of the rows of the data matrix, this matrix is called the connectivity matrix. A second identically sized matrix is created to count the number of times that any pair of row ids are called in any one re-sampled clustering. This matrix is called the identity matrix. For each iteration within the experiment the rows sampled are recorded in the identity matrix and then the co-occurrence of all pairs are recorded in the connectivity matrix. These values are incremented for each iteration until finally a conensensus matrix is generated by dividing the connectivity matrix by the identity matrix.

The consensus matrix is the raw output from cluscomp implemented as a class consmatrix. If the user has specified to return a merged matrix in addition to the consensus matrices then for each clustering with the same k (cluster number value) an object of class mergematrix is also returned in the list which is identical to a consmatrix with the exception that the 'cm' slot is occupied by the merged matrix (a weighted average of all the consensus matrices for the cluster number matched consensus matrices) and there is no reference matrix slot (as there is no reference clustering for the merge). The user should instead call the memrob function using the merge matrix and providing a reference matrix from one of the cluster number matched consmatrix objects from which the merge was generated. This provides a way to quantify the difference between single and multi-algorithm resampling schemes.

Value

a list of objects of class consmatrix and (if merge specified) mergematrix. See consmatrix and mergematrix for details.

Author(s)

Dr. T. Ian Simpson ian.simpson@ed.ac.uk

References

Merged consensus clustering to assess and improve class discovery with microarray data. Simpson TI, Armstrong JD and Jarman AP. BMC Bioinformatics 2010, 11:590.

Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Monti, S., Tamayo, P., Mesirov, J. and Golub, T. Machine Learning, 52, July 2003.

See Also

cluster,clrob,memrob

Examples

#load test data
data(sim_profile);

#perform a group of re-sampling clustering experiments accepting default parameters 
#for the clustering algorithms
cmr <- cluscomp(
 sim_profile,
 algorithms=list('kmeans','pam'),
 merge=1,
 clmin=2,
 clmax=5,
 reps=5
)

#display resulting matrices contained in the consensus result list
summary(cmr);

#display the cluster robusteness for the kmeans k=4 consensus matrix
clrob(cmr$e2_pam_k4);

#plot a heatmap of the consensus matrix, note you access the cluster matrix object 
#through the cm slot
#heatmap(cmr$e2_pam_k4@cm);

#display the membership robustness for kmeans k=4 cluster 1
memrob(cmr$e2_pam_k4)$cluster1;

#merged consensus example
#data(testcmr);

#calculate the membership robustness for the merge matrix when cluster number k=4,
#in reference to the pam scaffold. (see memrob for more details). 
#mr <- memrob(testcmr$merge_k4,testcmr$e1_kmeans_k4@rm);

#show the membership robustness for cluster 1
#mr$cluster1;

[Package clusterCons version 1.2 Index]