PHcluster {PHclust} | R Documentation |
Poisson hurdle clustering
Description
This function gives the clustering result based on a Poisson hurdle model.
Usage
PHcluster(
data,
Treatment,
nK,
method = c("EM", "SA"),
absolute = FALSE,
cool = 0.9,
nstart = 1
)
Arguments
data |
Data matrix with dimension N*P indicating N features and P samples. The cluster analysis is done feature-wised. |
Treatment |
Vector of length P. Indicating replicates of different treatment groups. For example, Treatment = c(1,1,2,2,3,3) indicates 3 treatment groups, each with 2 replicates. |
nK |
Positive integer. Number of clusters. |
method |
Method for the algorithm. Can choose between "EM" as Expectation Maximization or "SA" as Simulated Annealing. |
absolute |
Logical. Whether we should use absolute (TRUE) or relative (False) abundance of features to determine clusters. |
cool |
Real number between (0, 1). Cooling rate for the "SA" algorithm. Uses 0.9 by default. |
nstart |
Positive integer. Number of starts for the entire algorithm. Note that as nstart increases the computational time also grows linearly. Uses 1 by default. |
Value
- cluster
Vector of length N consisting of integers from 1 to nK. Indicating final clustering result. For evaluating the clustering result please check NMI for Normalized Mutual Information.
- prob
N*nK matrix. The (i, j)th element representing the probability that observation i belongs to cluster j.
- log_l
Scaler. The Poisson hurdle log-likelihood of the final clustering result.
- alpha
Vector of length N. The geometric mean abundance level for each feature, across all treatment groups.
- Normalizer
vector of length P. The normalizing constant of sequencing depth for each sample.
Examples
######## Run the following codes in order:
##
## This is a sample data set which has 100 features, and 4 treatment groups with 4 replicates each.
data('sample_data')
head(sample_data)
set.seed(1)
##
## Finding the optimal number of clusters
K <- Hybrid(sample_data, Kstart = 4, Treatment = rep(c(1,2,3,4), each = 4))
##
## Clustering result from EM algorithm
result <- PHcluster(sample_data, rep(c(1,2,3,4), each = 4), K, method = 'EM', nstart = 1)
print(result$cluster)
##
## Plot the feature abundance level for each cluster
plot_abundance(result, sample_data, Treatment = rep(c(1,2,3,4), each = 4))