R: Calculate optimal number of clusters.

Hybrid {PHclust}

R Documentation

Calculate optimal number of clusters.

Description

This function estimates the optimal number of clusters for a given dataset.

Usage

Hybrid(data, absolute = FALSE, Kstart = NULL, Treatment)

Arguments

`data`	Data matrix with dimension N*P indicating N features and P samples.
`absolute`	Logical. Whether we should use absolute (TRUE) or relative (FALSE) abundance of features to determine clusters.
`Kstart`	Positive integer. The number of clusters for starting the hybrid merging algorithm. Should be relatively large to ensure that Kstart > optimal number of clusters. Uses max(50, sqrt(N)) by default.
`Treatment`	Vector of length p, indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.

Value

A positive integer indicating the optimal number of clusters

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

[Package PHclust version 0.1.0 Index]