u_cluster_similarity {eclust} R Documentation

## Cluster similarity matrix

### Description

Return cluster membership of each predictor. This function is called internally by the s_generate_data and s_generate_data_mars functions. Is also used by the r_clust function for real data analysis.

### Usage

u_cluster_similarity(x, expr, exprTest, distanceMethod,
clustMethod = c("hclust", "protoclust"), cutMethod = c("dynamic", "gap",
"fixed"), nClusters, method = c("complete", "average", "ward.D2", "single",
"ward.D", "mcquitty", "median", "centroid"), K.max = 10, B = 50, nPC,
minimum_cluster_size = 50)

### Arguments

 x similarity matrix. must have non-NULL dimnames i.e., the rows and columns should be labelled, e.g. "Gene1, Gene2, ..." expr gene expression data (training set). rows are people, columns are genes exprTest gene expression test set. If using real data, and you dont have enough samples for a test set then just supply the same data supplied to the expr argument distanceMethod one of "euclidean","maximum","manhattan", "canberra", "binary","minkowski" to be passed to dist function. If missing, then this function will take 1-x as the dissimilarity measure. This functionality is for diffCorr,diffTOM, fisherScore matrices which need to be converted to a distance type matrix. clustMethod Cluster the data using hierarchical clustering or prototype clustering. Defaults clustMethod="hclust". Other option is protoclust, however this package must be installed before proceeding with this option cutMethod what method to use to cut the dendrogram. 'dynamic' refers to cutreeDynamicTree library. 'gap' is Tibshirani's gap statistic clusGap using the 'Tibs2001SEmax' rule. 'fixed' is a fixed number specified by the nClusters argument nClusters number of clusters. Only used if cutMethod = fixed method the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). K.max the maximum number of clusters to consider, must be at least two. Only used if cutMethod='gap' B integer, number of Monte Carlo (“bootstrap”) samples. Only used if cutMethod='gap' nPC number of principal components. Can be 1 or 2. minimum_cluster_size The minimum cluster size. Only applicable if cutMethod='dynamic'. This argument is passed to the cutreeDynamic function. Default is 50.

### Value

a list of length 2:

clusters

a p x 3 data.frame or data.table which give the cluster membership of each gene, where p is the number of genes. The first column is the gene name, the second column is the cluster number (numeric) and the third column is the cluster membership as a character vector of color names (these will match up exactly with the cluster number)

pcInfo

a list of length 9:

eigengenes

a list of the eigengenes i.e. the 1st (and 2nd if nPC=2) principal component of each module

averageExpr

a data.frame of the average expression for each module for the training set

averageExprTest

a data.frame of the average expression for each module for the test set

varExplained

percentage of variance explained by each 1st (and 2nd if nPC=2) principal component of each module

validColors

cluster membership of each gene

PC

a data.frame of the 1st (and 2nd if nPC=2) PC for each module for the training set

PCTest

a data.frame of the 1st (and 2nd if nPC=2) PC for each module for the test set

prcompObj

the prcomp object

nclusters

a numeric value for the total number of clusters

### Examples

data("simdata")
X = simdata[,c(-1,-2)]
train_index <- sample(1:nrow(simdata),100)

cluster_results <- u_cluster_similarity(x = cor(X),
expr = X[train_index,],
exprTest = X[-train_index,],
distanceMethod = "euclidean",
clustMethod = "hclust",
cutMethod = "dynamic",
method = "average", nPC = 2,
minimum_cluster_size = 75)

cluster_results$clusters[, table(module)] names(cluster_results$pcInfo)
cluster_results$pcInfo$nclusters

[Package eclust version 0.1.0 Index]