R: EM algorithm for Gaussian mixture models

EMAlgorithm {GMCM}

R Documentation

EM algorithm for Gaussian mixture models

Description

The regular expectation-maximization algorithm for general multivariate Gaussian mixture models.

Usage

EMAlgorithm(x, theta, m, eps = 1e-06, max.ite = 1e+05,
  trace.theta = FALSE, verbose = FALSE)

Arguments

`x`	A `matrix` of observations where each row correspond to an observation and each columns to a feature/variable.
`theta`	A list of parameters of class `theta` as described in `rtheta`. Optional. If not provided `m` should be given.
`m`	`numeric`. The number of components if `theta` is not supplied.
`eps`	The maximal required difference in successive likelihoods to establish convergence.
`max.ite`	The maximum number of iterations.
`trace.theta`	Logical. If `TRUE`, all estimates are stored and returned. Default is `FALSE`.
`verbose`	Set to `TRUE` for verbose output. Default is `FALSE`.

Details

Though not as versatile, the algorithm can be a faster alternative to Mclust in the mclust-package. If theta is not given, a k-means clustering is used to determine the initial theta.

Value

A list of length 3 with elements:

`theta`	A list of the estimated parameters as described in `rtheta`.
`loglik.tr`	A numeric vector of the log-likelihood trace.
`kappa`	A matrix where `kappa[i,j]` is the probability that `x[i, ]` is realized from the `j`'th component.

Author(s)

Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>

Examples

set.seed(3)
true.theta <- rtheta(d = 2, m = 3, method = "old")
true.theta$sigma <- lapply(true.theta$sigma, cov2cor) # Scale
## Not run: 
plot(true.theta, nlevels = 20, add.ellipses = TRUE)

## End(Not run)

data <- SimulateGMCMData(n = 1000, theta = true.theta)
start.theta <- rtheta(d = 2, m = 3)
start.theta$mu <- t(kmeans(data$z, 3)$centers) # More sensible location estimates
start.theta <- as.theta(start.theta) # Coerce the matrix to a list
res <- GMCM:::EMAlgorithm(data$z, theta = start.theta)

par(mfrow = c(1,2))
plot(data$z, cex = 0.5, pch = 16, main = "Simulated data",
     col = rainbow(3)[data$K])
plot(data$z, cex = 0.5, pch = 16, main = "GMM clustering",
     col = rainbow(3)[apply(res$kappa,1,which.max)])

[Package GMCM version 1.4 Index]