R: Poisson hurdle clustering

PHcluster {PHclust}

R Documentation

Poisson hurdle clustering

Description

This function gives the clustering result based on a Poisson hurdle model.

Usage

PHcluster(
  data,
  Treatment,
  nK,
  method = c("EM", "SA"),
  absolute = FALSE,
  cool = 0.9,
  nstart = 1
)

Arguments

`data`	Data matrix with dimension N*P indicating N features and P samples. The cluster analysis is done feature-wised.
`Treatment`	Vector of length P. Indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.
`nK`	Positive integer. Number of clusters.
`method`	Method for the algorithm. Can choose between "EM" as Expectation Maximization or "SA" as Simulated Annealing.
`absolute`	Logical. Whether we should use absolute (TRUE) or relative (False) abundance of features to determine clusters.
`cool`	Real number between (0, 1). Cooling rate for the "SA" algorithm. Uses 0.9 by default.
`nstart`	Positive integer. Number of starts for the entire algorithm. Note that as nstart increases the computational time also grows linearly. Uses 1 by default.

Value

cluster: Vector of length N consisting of integers from 1 to nK. Indicating final clustering result. For evaluating the clustering result please check NMI for Normalized Mutual Information.
prob: N*nK matrix. The (i, j)th element representing the probability that observation i belongs to cluster j.
log_l: Scaler. The Poisson hurdle log-likelihood of the final clustering result.
alpha: Vector of length N. The geometric mean abundance level for each feature, across all treatment groups.
Normalizer: vector of length P. The normalizing constant of sequencing depth for each sample.

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

[Package PHclust version 0.1.0 Index]