R: yaConsensus computes a number of hierarchical clusterings by...

yaConsensus {yaConsensus}

R Documentation

yaConsensus computes a number of hierarchical clusterings by sampling either samples or features.

Description

This function mainly generates a list of "hclust" objects for downstream analysis.

Usage

yaConsensus(ddata, runs = 1000, epsilon = 0.65, is_by_sample = TRUE, 
            distMethod = "euclidean", hcMethod = "ward.D2", prefix = NULL)

Arguments

`ddata`	either a data matrix (samples in rows, and features in columns), or a "dist" object.
`runs`	an integer value for the number of samplings.
`epsilon`	a real value indicating the sampling rate.
`is_by_sample`	a logical value indicating if the sampling is by samples (TRUE) or features (FALSE).
`distMethod`	a character indicating the kind of distance for the inner clustering. It can be any of the methods from the `dist` function.
`hcMethod`	a character indicating the linkage mathod of the inner clustering. It can be any of the methods from the `hclust` function.
`prefix`	string specifying a prefix to store the results in a .RData file.

Details

This function can run sequentially or in parallel. In this case, it is necessary to register a cluster of CPUs according to doParallel protocol.

To get the consensus clustering, the output of the function has to be processed with the plot() function. The consensus dissimilarity follows from the algorithm of Monti et al. (2003). The consensus clustering is from a hierarchical procedure (hclust) with "complete" linkage (outer hc method).

Value

A named list with the following slots:

`distMethod`	matches the input
`hcMethod`	matches the input
`lables`	a string list with the names of the samples
`bySample`	matches 'is_by_sample' input parameter
`epsilon`	matches the input
`subsetDimension`	actual dimension of the subsets
`runs`	matches the input
`hclust`	a list of 'hclust' objects
`elapsed_time`	time (in seconds) required
`ncores`	the number of cores used

Note

The plot function in the example provides an invisible result with detail ans statistics of the experiment.

Author(s)

Stefano M. Pagnotta

References

Monti et al. (2003) - Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data - Machine Learning 52(1-2):91-118 <DOI: 10.1023/A:1023949509487>

Examples

## Generate data and annotation
n <- 50; m <- 3000
ddata <- matrix(rnorm(n * m), ncol = m)  
ddata[1:20, ] <- ddata[1:20, ] + 0.2
row.names(ddata) <- c(paste0("A", 1:20), paste0("B", 1:30))
ddist <- dist(ddata)

annotation <- data.frame(row.names = rownames(ddata), clust = substr(rownames(ddata), 1, 1))
annotation.colorCode <- c("red", "blue")
names(annotation.colorCode) <- c("A", "B")

####### run in sequential mode
####### sampling the samples ....
(aConsensus <- yaConsensus(ddist))
plot(aConsensus, G = 2)

ans <- plot(aConsensus, G = 2, 
            annotation = annotation, 
            annotation.colorCode = annotation.colorCode)
summary(ans)
summary(ans, given = "clust")

####### sampling the features ....
(aConsensus <- yaConsensus(ddata, runs= 20, epsilon = 0.2, is_by_sample = FALSE))
ans <- plot(aConsensus, G = 2, 
            annotation = annotation, 
            annotation.colorCode = annotation.colorCode,
            matching_clustering = "clust")

summary(ans, given = "clust")


####### run in parallel mode
## uncomment to run

# require(doParallel)
# cpu_cluster <- makeCluster(3)
# registerDoParallel(cpu_cluster)

# (aConsensus <- yaConsensus(ddist))
# plot(aConsensus, G = 2)

#stopCluster(cpu_cluster)

[Package yaConsensus version 1.0 Index]