hmm.clust {DBHC} | R Documentation |
DBHC Algorithm
Description
Implementation of the DBHC algorithm, an HMM clustering algorithm that finds a mixture of discrete-output HMMs. The algorithm uses heuristics based on BIC to search for the optimal number of hidden states in each HMM and the optimal number of clusters.
Usage
hmm.clust(
sequences,
id = NULL,
smoothing = 1e-04,
eps = 0.001,
init.size = 2,
alphabet = NULL,
K.max = NULL,
log_space = FALSE,
print = FALSE,
seed.size = 3
)
Arguments
sequences |
An |
id |
A vector with ids that identify the sequences in |
smoothing |
Smoothing parameter for absolute discounting in
|
eps |
A threshold epsilon for counting parameters in
|
init.size |
The number of HMM states in an initial HMM. |
alphabet |
The alphabet of output labels, if not provided alphabet is
taken from |
K.max |
Maximum number of clusters, if not provided algorithm searches for the optimal number itself. |
log_space |
Logical, parameter provided to
|
print |
Logical, whether to print intermediate steps or not. |
seed.size |
Seed size, the number of sequences to be selected for a seed |
Value
A list with components:
sequences
An
stslist
object of sequences with discrete observations.id
A vector with ids that identify the sequences in
sequences
.cluster
A vector with found cluster memberships for the sequences.
partition
A list object with the partition, a mixture of HMMs. Each element in the list is an
hmm
object.memberships
A matrix with cluster memberships for each sequence.
n.clusters
Numerical, the found number of clusters.
sizes
A vector with the number of HMM states for each cluster model.
bic
A vector with the BICs for each cluster model.
Examples
## Simulated data
library(seqHMM)
output.labels <- c("H", "T")
# HMM 1
states.1 <- c("A", "B", "C")
transitions.1 <- matrix(c(0.8,0.1,0.1,0.1,0.8,0.1,0.1,0.1,0.8), nrow = 3)
rownames(transitions.1) <- states.1
colnames(transitions.1) <- states.1
emissions.1 <- matrix(c(0.5,0.75,0.25,0.5,0.25,0.75), nrow = 3)
rownames(emissions.1) <- states.1
colnames(emissions.1) <- output.labels
initials.1 <- c(1/3,1/3,1/3)
# HMM 2
states.2 <- c("A", "B")
transitions.2 <- matrix(c(0.75,0.25,0.25,0.75), nrow = 2)
rownames(transitions.2) <- states.2
colnames(transitions.2) <- states.2
emissions.2 <- matrix(c(0.8,0.6,0.2,0.4), nrow = 2)
rownames(emissions.2) <- states.2
colnames(emissions.2) <- output.labels
initials.2 <- c(0.5,0.5)
# Simulate
hmm.sim.1 <- simulate_hmm(n_sequences = 100,
initial_probs = initials.1,
transition_probs = transitions.1,
emission_probs = emissions.1,
sequence_length = 25)
hmm.sim.2 <- simulate_hmm(n_sequences = 100,
initial_probs = initials.2,
transition_probs = transitions.2,
emission_probs = emissions.2,
sequence_length = 25)
sequences <- rbind(hmm.sim.1$observations, hmm.sim.2$observations)
n <- nrow(sequences)
# Clustering algorithm
id <- paste0("K-", 1:n)
rownames(sequences) <- id
sequences <- sequences[sample(1:n, n),]
res <- hmm.clust(sequences, id = rownames(sequences))
#############################################################################
## Swiss Household Data
data("biofam", package = "TraMineR")
# Clustering algorithm
new.alphabet <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D")
sequences <- seqdef(biofam[,10:25], alphabet = 0:7, states = new.alphabet)
## Not run:
res <- hmm.clust(sequences)
# Heatmaps
cluster <- 1 # display heatmaps for cluster 1
transition.heatmap(res$partition[[cluster]]$transition_probs,
res$partition[[cluster]]$initial_probs)
emission.heatmap(res$partition[[cluster]]$emission_probs)
## End(Not run)
## A smaller example, which takes less time to run
subset <- sequences[sample(1:nrow(sequences), 20, replace = FALSE),]
# Clustering algorithm, limiting number of clusters to 2
res <- hmm.clust(subset, K.max = 2)
# Number of clusters
print(res$n.clusters)
# Table of cluster memberships
table(res$memberships[,"cluster"])
# BIC for each number of clusters
print(res$bic)
# Heatmaps
cluster <- 1 # display heatmaps for cluster 1
transition.heatmap(res$partition[[cluster]]$transition_probs,
res$partition[[cluster]]$initial_probs)
emission.heatmap(res$partition[[cluster]]$emission_probs)