R: Estimate the parameters of a Gaussian mixture model

em {gmgm}

R Documentation

Estimate the parameters of a Gaussian mixture model

Description

This function estimates the parameters of a Gaussian mixture model using the expectation-maximization (EM) algorithm. Given an initial model, this algorithm iteratively updates the parameters, monotonically increasing the log-likelihood until convergence to a local maximum (Bilmes, 1998). A Bayesian regularization is applied by default to prevent that a mixture component comes down to a single point and leads to a zero covariance matrix (Ormoneit and Tresp, 1996). Although the EM algorithm only applies to the joint model, good parameters can be found for a derived conditional model. However, care should be taken as the monotonic increase of the conditional log-likelihood is not guaranteed.

Usage

em(
  gmm,
  data,
  regul = 0.01,
  epsilon = 1e-06,
  max_iter_em = 100,
  verbose = FALSE
)

Arguments

`gmm`	An initial object of class `gmm`.
`data`	A data frame or numeric matrix containing the data used in the EM algorithm. Its columns must explicitly be named after the variables of `gmm` and must not contain missing values.
`regul`	A positive numeric value corresponding to the regularization constant if a Bayesian regularization is applied. If `NULL`, no regularization is applied.
`epsilon`	A positive numeric value corresponding to the convergence threshold for the increase in log-likelihood.
`max_iter_em`	A non-negative integer corresponding to the maximum number of iterations.
`verbose`	A logical value indicating whether iterations in progress are displayed.

Value

A list with elements:

`gmm`	The final `gmm` object.
`posterior`	A numeric matrix containing the posterior probabilities for each observation.
`seq_loglik`	A numeric vector containing the sequence of log-likelihoods measured initially and after each iteration.

References

Bilmes, J. A. (1998). A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical report, International Computer Science Institute.

Ormoneit, D. and Tresp, V. (1996). Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging. In Advances in Neural Information Processing Systems 8, pages 542–548.

Examples

data(data_body)
gmm_1 <- split_comp(add_var(NULL,
                            data_body[, c("WAIST", "AGE", "FAT", "HEIGHT",
                                          "WEIGHT")]),
                    n_sub = 3)
res_em <- em(gmm_1, data_body, verbose = TRUE)

[Package gmgm version 1.1.2 Index]