PHcluster {PHclust}R Documentation

Poisson hurdle clustering

Description

This function gives the clustering result based on a Poisson hurdle model.

Usage

PHcluster(
  data,
  Treatment,
  nK,
  method = c("EM", "SA"),
  absolute = FALSE,
  cool = 0.9,
  nstart = 1
)

Arguments

data

Data matrix with dimension N*P indicating N features and P samples. The cluster analysis is done feature-wised.

Treatment

Vector of length P. Indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates.

nK

Positive integer. Number of clusters.

method

Method for the algorithm. Can choose between "EM" as Expectation Maximization or "SA" as Simulated Annealing.

absolute

Logical. Whether we should use absolute (TRUE) or relative (False) abundance of features to determine clusters.

cool

Real number between (0, 1). Cooling rate for the "SA" algorithm. Uses 0.9 by default.

nstart

Positive integer. Number of starts for the entire algorithm. Note that as nstart increases the computational time also grows linearly. Uses 1 by default.

Value

cluster

Vector of length N consisting of integers from 1 to nK. Indicating final clustering result. For evaluating the clustering result please check NMI for Normalized Mutual Information.

prob

N*nK matrix. The (i, j)th element representing the probability that observation i belongs to cluster j.

log_l

Scaler. The Poisson hurdle log-likelihood of the final clustering result.

alpha

Vector of length N. The geometric mean abundance level for each feature, across all treatment groups.

Normalizer

vector of length P. The normalizing constant of sequencing depth for each sample.

Examples

######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))

[Package PHclust version 0.1.0 Index]