R: Cluster similarity matrix

u_cluster_similarity {eclust}

R Documentation

Cluster similarity matrix

Description

Return cluster membership of each predictor. This function is called internally by the s_generate_data and s_generate_data_mars functions. Is also used by the r_clust function for real data analysis.

Usage

u_cluster_similarity(x, expr, exprTest, distanceMethod,
  clustMethod = c("hclust", "protoclust"), cutMethod = c("dynamic", "gap",
  "fixed"), nClusters, method = c("complete", "average", "ward.D2", "single",
  "ward.D", "mcquitty", "median", "centroid"), K.max = 10, B = 50, nPC,
  minimum_cluster_size = 50)

Arguments

`x`	similarity matrix. must have non-NULL dimnames i.e., the rows and columns should be labelled, e.g. "Gene1, Gene2, ..."
`expr`	gene expression data (training set). rows are people, columns are genes
`exprTest`	gene expression test set. If using real data, and you dont have enough samples for a test set then just supply the same data supplied to the `expr` argument
`distanceMethod`	one of "euclidean","maximum","manhattan", "canberra", "binary","minkowski" to be passed to `dist` function. If missing, then this function will take 1-x as the dissimilarity measure. This functionality is for diffCorr,diffTOM, fisherScore matrices which need to be converted to a distance type matrix.
`clustMethod`	Cluster the data using hierarchical clustering or prototype clustering. Defaults `clustMethod="hclust"`. Other option is `protoclust`, however this package must be installed before proceeding with this option
`cutMethod`	what method to use to cut the dendrogram. `'dynamic'` refers to `cutreeDynamicTree` library. `'gap'` is Tibshirani's gap statistic `clusGap` using the `'Tibs2001SEmax'` rule. `'fixed'` is a fixed number specified by the `nClusters` argument
`nClusters`	number of clusters. Only used if `cutMethod = fixed`
`method`	the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).
`K.max`	the maximum number of clusters to consider, must be at least two. Only used if `cutMethod='gap'`
`B`	integer, number of Monte Carlo (“bootstrap”) samples. Only used if `cutMethod='gap'`
`nPC`	number of principal components. Can be 1 or 2.
`minimum_cluster_size`	The minimum cluster size. Only applicable if `cutMethod='dynamic'`. This argument is passed to the `cutreeDynamic` function. Default is 50.

Value

a list of length 2:

clusters

a p x 3 data.frame or data.table which give the cluster membership of each gene, where p is the number of genes. The first column is the gene name, the second column is the cluster number (numeric) and the third column is the cluster membership as a character vector of color names (these will match up exactly with the cluster number)

pcInfo

a list of length 9:

eigengenes: a list of the eigengenes i.e. the 1st (and 2nd if nPC=2) principal component of each module
averageExpr: a data.frame of the average expression for each module for the training set
averageExprTest: a data.frame of the average expression for each module for the test set
varExplained: percentage of variance explained by each 1st (and 2nd if nPC=2) principal component of each module
validColors: cluster membership of each gene
PC: a data.frame of the 1st (and 2nd if nPC=2) PC for each module for the training set
PCTest: a data.frame of the 1st (and 2nd if nPC=2) PC for each module for the test set
prcompObj: the prcomp object
nclusters: a numeric value for the total number of clusters

Examples

data("simdata")
X = simdata[,c(-1,-2)]
train_index <- sample(1:nrow(simdata),100)

cluster_results <- u_cluster_similarity(x = cor(X),
                                        expr = X[train_index,],
                                        exprTest = X[-train_index,],
                                        distanceMethod = "euclidean",
                                        clustMethod = "hclust",
                                        cutMethod = "dynamic",
                                        method = "average", nPC = 2,
                                        minimum_cluster_size = 75)

cluster_results$clusters[, table(module)]
names(cluster_results$pcInfo)
cluster_results$pcInfo$nclusters

[Package eclust version 0.1.0 Index]