RFCluster {RFclust} | R Documentation |
A wrapper for Random Forest Consensus Clustering
Description
This takes a list of matrices of different data types , features in rows, samples in columns, and performs random forest clustering (one-dimensional). When multiple data types are available this is one way of modelling the data together.
Usage
RFCluster(Data, ClustAlg = "pam", MaxK, nTrees = 1000,
exportFigures = "pdf", ClustReps = 500, ProjectName = "RFCluster",
verbose = TRUE, ...)
Arguments
Data |
Named list, contains matrices with samples in columns, features in rows. The names of the list should represent the platform or the feature type, such as expression, or CN, or clin; as long as it is distinct. |
ClustAlg |
Algorithm for consensus clustering |
MaxK |
Maximum number of clusters you are searching for |
nTrees |
How many trees are we using in the random forest to generate a proximity matrix? |
ProjectName |
Name of the project, to annotate plots and other output |
ClustReps |
Number of replicates for consensus clustering |
verbose |
Should output be verbose? |
exportFigures |
Format of the results file for figures et cetera to be exported to |
... |
Other optional arguments, passed onto ConsensusClusterPlus; see that package's documentation for a full set. |
Value
Standard output for ConsensusClusterPlus runs.
Author(s)
Ankur Chakravarthy, PhD
References
Monti, S., Tamayo, P., Mesirov, J. et al. Machine Learning (2003) 52: 91. https://doi.org/10.1023/A:1023949509487
Tao Shi & Steve Horvath (2006) Unsupervised Learning With Random Forest Predictors, Journal of Computational and Graphical Statistics, 15:1, 118-138, DOI: 10.1198/106186006X94072
Examples
library(RFclust)
#Get GBM example data from the iCluster package, repackaged to maintain CRAN compatibility
data(gbm)
#Transpose so columns are samples and features are rows
gbm.t <- lapply(gbm, t)
#Make sure the sample names are the same across the matrices for the different
#samples - the code breaks otherwise
colnames(gbm.t[[2]]) <- colnames(gbm.t[[3]]) <- colnames(gbm.t[[1]])
#Run function on that dataset - these methods are computationally intensive
#so automatic testing during build has been disabled (takes > 5s).
#Users may test the software by running the code separately as the example is reproducible
Test.cluster <- RFCluster(Data = gbm.t, ClustAlg = "pam", MaxK = 5,
nTrees = 10, ProjectName = "RFCluster_Test", ClustReps = 50 , writeTable = FALSE, plot = NULL)
unlink("RFCluster_Test",recursive = TRUE)