smem {gmgm}R Documentation

Select the number of mixture components and estimate the parameters of a Gaussian mixture model

Description

This function selects the number of mixture components and estimates the parameters of a Gaussian mixture model using a split-and-merge EM (SMEM) algorithm. At the first iteration, the classic EM algorithm is performed to update the parameters of the initial model. Then each following iteration consists in splitting a component into two or merging two components, before re-estimating the parameters with the EM algorithm. The selected split or merge operation is the one that maximizes a scoring function (after the re-estimation process). To avoid testing all possible operations, the split and merge candidates are initially ranked according to relevant criteria (Zhang et al., 2003). At first, the top-ranked split and top-ranked merge operations are tested. If neither of them increases the score, the second-ranked ones are considered, and so on. The SMEM algorithm stops if a given maximum rank is reached without improving the score.

Usage

smem(
  gmm,
  data,
  y = NULL,
  score = "bic",
  split = TRUE,
  merge = TRUE,
  min_comp = 1,
  max_comp = Inf,
  space = 0.5,
  max_rank = 1,
  max_iter_smem = 10,
  verbose = FALSE,
  ...
)

Arguments

gmm

An initial object of class gmm.

data

A data frame or numeric matrix containing the data used in the SMEM algorithm. Its columns must explicitly be named after the variables of gmm and must not contain missing values.

y

A character vector containing the dependent variables if a conditional model is estimated (which involves maximizing a conditional score). If NULL (the default), the joint model is estimated.

score

A character string ("aic", "bic" or "loglik") corresponding to the scoring function.

split

A logical value indicating whether split operations are allowed (if FALSE, no mixture component can be split).

merge

A logical value indicating whether merge operations are allowed (if FALSE, no mixture component can be merged).

min_comp

A positive integer corresponding to the minimum number of mixture components.

max_comp

A positive integer corresponding to the maximum number of mixture components.

space

A numeric value in [0, 1[ corresponding to the space between two subcomponents resulting from a split.

max_rank

A positive integer corresponding to the maximum rank for testing the split and merge candidates.

max_iter_smem

A non-negative integer corresponding to the maximum number of iterations.

verbose

A logical value indicating whether iterations in progress are displayed.

...

Additional arguments passed to function em.

Value

A list with elements:

gmm

The final gmm object.

posterior

A numeric matrix containing the posterior probabilities for each observation.

seq_score

A numeric vector containing the sequence of scores measured initially and after each iteration.

seq_oper

A character vector containing the sequence of split and merge operations performed at each iteration.

References

Zhang, Z., Chen, C., Sun, J. and Chan, K. L. (2003). EM algorithms for Gaussian mixtures with split-and-merge operation. Pattern Recognition, 36(9):1973–1983.

See Also

em, stepwise

Examples

data(data_body)
gmm_1 <- add_var(NULL, c("WAIST", "AGE", "FAT", "HEIGHT", "WEIGHT"))
res_smem <- smem(gmm_1, data_body, max_comp = 3, verbose = TRUE)


[Package gmgm version 1.1.2 Index]