u_cluster_similarity {eclust} | R Documentation |
Cluster similarity matrix
Description
Return cluster membership of each predictor. This function is
called internally by the s_generate_data
and
s_generate_data_mars
functions. Is also used by the
r_clust
function for real data analysis.
Usage
u_cluster_similarity(x, expr, exprTest, distanceMethod,
clustMethod = c("hclust", "protoclust"), cutMethod = c("dynamic", "gap",
"fixed"), nClusters, method = c("complete", "average", "ward.D2", "single",
"ward.D", "mcquitty", "median", "centroid"), K.max = 10, B = 50, nPC,
minimum_cluster_size = 50)
Arguments
x |
similarity matrix. must have non-NULL dimnames i.e., the rows and columns should be labelled, e.g. "Gene1, Gene2, ..." |
expr |
gene expression data (training set). rows are people, columns are genes |
exprTest |
gene expression test set. If using real data, and you dont
have enough samples for a test set then just supply the same data supplied
to the |
distanceMethod |
one of "euclidean","maximum","manhattan", "canberra",
"binary","minkowski" to be passed to |
clustMethod |
Cluster the data using hierarchical clustering or
prototype clustering. Defaults |
cutMethod |
what method to use to cut the dendrogram. |
nClusters |
number of clusters. Only used if |
method |
the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). |
K.max |
the maximum number of clusters to consider, must be at least
two. Only used if |
B |
integer, number of Monte Carlo (“bootstrap”) samples. Only used if
|
nPC |
number of principal components. Can be 1 or 2. |
minimum_cluster_size |
The minimum cluster size. Only applicable if
|
Value
a list of length 2:
- clusters
a p x 3 data.frame or data.table which give the cluster membership of each gene, where p is the number of genes. The first column is the gene name, the second column is the cluster number (numeric) and the third column is the cluster membership as a character vector of color names (these will match up exactly with the cluster number)
- pcInfo
a list of length 9:
- eigengenes
a list of the eigengenes i.e. the 1st (and 2nd if nPC=2) principal component of each module
- averageExpr
a data.frame of the average expression for each module for the training set
- averageExprTest
a data.frame of the average expression for each module for the test set
- varExplained
percentage of variance explained by each 1st (and 2nd if nPC=2) principal component of each module
- validColors
cluster membership of each gene
- PC
a data.frame of the 1st (and 2nd if nPC=2) PC for each module for the training set
- PCTest
a data.frame of the 1st (and 2nd if nPC=2) PC for each module for the test set
- prcompObj
the
prcomp
object- nclusters
a numeric value for the total number of clusters
Examples
data("simdata")
X = simdata[,c(-1,-2)]
train_index <- sample(1:nrow(simdata),100)
cluster_results <- u_cluster_similarity(x = cor(X),
expr = X[train_index,],
exprTest = X[-train_index,],
distanceMethod = "euclidean",
clustMethod = "hclust",
cutMethod = "dynamic",
method = "average", nPC = 2,
minimum_cluster_size = 75)
cluster_results$clusters[, table(module)]
names(cluster_results$pcInfo)
cluster_results$pcInfo$nclusters