Hybrid {PHclust}R Documentation

Calculate optimal number of clusters.

Description

This function estimates the optimal number of clusters for a given dataset.

Usage

Hybrid(data, absolute = FALSE, Kstart = NULL, Treatment)

Arguments

data

Data matrix with dimension N*P indicating N features and P samples.

absolute

Logical. Whether we should use absolute (TRUE) or relative (FALSE) abundance of features to determine clusters.

Kstart

Positive integer. The number of clusters for starting the hybrid merging algorithm. Should be relatively large to ensure that Kstart > optimal number of clusters. Uses max(50, sqrt(N)) by default.

Treatment

Vector of length p, indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.

Value

A positive integer indicating the optimal number of clusters

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

[Package PHclust version 0.1.0 Index]