EMAlgorithm {GMCM}R Documentation

EM algorithm for Gaussian mixture models

Description

The regular expectation-maximization algorithm for general multivariate Gaussian mixture models.

Usage

EMAlgorithm(x, theta, m, eps = 1e-06, max.ite = 1e+05,
  trace.theta = FALSE, verbose = FALSE)

Arguments

x

A matrix of observations where each row correspond to an observation and each columns to a feature/variable.

theta

A list of parameters of class theta as described in rtheta. Optional. If not provided m should be given.

m

numeric. The number of components if theta is not supplied.

eps

The maximal required difference in successive likelihoods to establish convergence.

max.ite

The maximum number of iterations.

trace.theta

Logical. If TRUE, all estimates are stored and returned. Default is FALSE.

verbose

Set to TRUE for verbose output. Default is FALSE.

Details

Though not as versatile, the algorithm can be a faster alternative to Mclust in the mclust-package. If theta is not given, a k-means clustering is used to determine the initial theta.

Value

A list of length 3 with elements:

theta

A list of the estimated parameters as described in rtheta.

loglik.tr

A numeric vector of the log-likelihood trace.

kappa

A matrix where kappa[i,j] is the probability that x[i, ] is realized from the j'th component.

Author(s)

Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>

See Also

rtheta, PseudoEMAlgorithm

Examples

set.seed(3)
true.theta <- rtheta(d = 2, m = 3, method = "old")
true.theta$sigma <- lapply(true.theta$sigma, cov2cor) # Scale
## Not run: 
plot(true.theta, nlevels = 20, add.ellipses = TRUE)

## End(Not run)

data <- SimulateGMCMData(n = 1000, theta = true.theta)
start.theta <- rtheta(d = 2, m = 3)
start.theta$mu <- t(kmeans(data$z, 3)$centers) # More sensible location estimates
start.theta <- as.theta(start.theta) # Coerce the matrix to a list
res <- GMCM:::EMAlgorithm(data$z, theta = start.theta)

par(mfrow = c(1,2))
plot(data$z, cex = 0.5, pch = 16, main = "Simulated data",
     col = rainbow(3)[data$K])
plot(data$z, cex = 0.5, pch = 16, main = "GMM clustering",
     col = rainbow(3)[apply(res$kappa,1,which.max)])

[Package GMCM version 1.4 Index]