cluscomp {clusterCons} | R Documentation |
Perform consensus clustering with the option of using multiple algorithms and parameters and merging
Description
Calculates an NxN consensus matrix for each clustering experiment performed where each entry has a value between 0 (never observed) and 1 (always observed)
When running with more than one algorithm or with the same algorithm and multiple conditions a consensus matrix will be generated for each.
These can optionally be merged into a mergematrix
by cluster number by setting merge=1.
Usage
cluscomp(
x,
diss=FALSE,
algorithms = list("kmeans"),
alparams = list(),
alweights = list(),
clmin = 2,
clmax = 10,
prop = 0.8,
reps = 50,
merge = 0
)
Arguments
x |
data.frame of numerical data with conditions as the column names and unique ids as the row names. All variables must be numeric. Missing values(NAs) are not allowed. Optionally you can pass a distance matrix directly, in which case you must ensure that the distance matrix is a data.frame and that the row and column names match each other (as the distance matrix is a pair-wise distance calculation). |
diss |
set to TRUE if you are providing a distance matrix, default is FALSE |
algorithms |
list of algorithm names which can be drawn from 'agnes','diana','pam','kmeans' or 'hclust'. The user can also write a simple wrapper for any other clustering method (see details) |
alparams |
list of algorithm paramter lists using the same specification as for the individual algorithm called (see details) |
alweights |
list of integer weights for each algorithm (only used when merging consensus results between algorithms) |
clmin |
integer for the smallest cluster number to consider |
clmax |
integer for the largest cluster number to consider |
prop |
numeric for the proportion of rows to sample during the process. Must be between 0 and 1 |
reps |
integer for the number of iterations to perform per clustering |
merge |
an integer indicating whether you also want the merged matrices (1) or just the consensus ones (0), accepts only 1 or 0. |
Details
cluscomp
is an implementation of a consensus clustering methodology first proposed by Monti et al. (2003) in which the connectivity between any two members of a data matrix is tested by resampling statistics. The principle is that by only sampling a random proportion of rows in the data matrix and performing many clustering experiments we can capture information about the robustness of the clusters identified by the full unsampled clustering result.
For each re-sampling experiment run a zero square matrix is created with identical rows and columns matching the unique ids of the rows of the data matrix, this matrix is called the connectivity matrix. A second identically sized matrix is created to count the number of times that any pair of row ids are called in any one re-sampled clustering. This matrix is called the identity matrix. For each iteration within the experiment the rows sampled are recorded in the identity matrix and then the co-occurrence of all pairs are recorded in the connectivity matrix. These values are incremented for each iteration until finally a conensensus matrix is generated by dividing the connectivity matrix by the identity matrix.
The consensus matrix is the raw output from cluscomp
implemented as a class
consmatrix
. If the user has specified to return a merged matrix in addition to the consensus
matrices then for each clustering with the same k (cluster number value) an object of class mergematrix
is also
returned in the list which is identical to a consmatrix
with the exception that the
'cm' slot is occupied by the merged matrix (a weighted average of all the consensus matrices for
the cluster number matched consensus matrices) and there is no reference matrix slot (as there is no
reference clustering for the merge). The user should instead call the memrob
function using the merge matrix and providing a reference matrix from one of the cluster number
matched consmatrix
objects from which the merge was generated. This provides a way
to quantify the difference between single and multi-algorithm resampling schemes.
Value
a list of objects of class consmatrix
and (if merge specified) mergematrix
. See consmatrix
and mergematrix
for details.
Author(s)
Dr. T. Ian Simpson ian.simpson@ed.ac.uk
References
Merged consensus clustering to assess and improve class discovery with microarray data. Simpson TI, Armstrong JD and Jarman AP. BMC Bioinformatics 2010, 11:590.
Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Monti, S., Tamayo, P., Mesirov, J. and Golub, T. Machine Learning, 52, July 2003.
See Also
Examples
#load test data
data(sim_profile);
#perform a group of re-sampling clustering experiments accepting default parameters
#for the clustering algorithms
cmr <- cluscomp(
sim_profile,
algorithms=list('kmeans','pam'),
merge=1,
clmin=2,
clmax=5,
reps=5
)
#display resulting matrices contained in the consensus result list
summary(cmr);
#display the cluster robusteness for the kmeans k=4 consensus matrix
clrob(cmr$e2_pam_k4);
#plot a heatmap of the consensus matrix, note you access the cluster matrix object
#through the cm slot
#heatmap(cmr$e2_pam_k4@cm);
#display the membership robustness for kmeans k=4 cluster 1
memrob(cmr$e2_pam_k4)$cluster1;
#merged consensus example
#data(testcmr);
#calculate the membership robustness for the merge matrix when cluster number k=4,
#in reference to the pam scaffold. (see memrob for more details).
#mr <- memrob(testcmr$merge_k4,testcmr$e1_kmeans_k4@rm);
#show the membership robustness for cluster 1
#mr$cluster1;