smem {gmgm} | R Documentation |
Select the number of mixture components and estimate the parameters of a Gaussian mixture model
Description
This function selects the number of mixture components and estimates the parameters of a Gaussian mixture model using a split-and-merge EM (SMEM) algorithm. At the first iteration, the classic EM algorithm is performed to update the parameters of the initial model. Then each following iteration consists in splitting a component into two or merging two components, before re-estimating the parameters with the EM algorithm. The selected split or merge operation is the one that maximizes a scoring function (after the re-estimation process). To avoid testing all possible operations, the split and merge candidates are initially ranked according to relevant criteria (Zhang et al., 2003). At first, the top-ranked split and top-ranked merge operations are tested. If neither of them increases the score, the second-ranked ones are considered, and so on. The SMEM algorithm stops if a given maximum rank is reached without improving the score.
Usage
smem(
gmm,
data,
y = NULL,
score = "bic",
split = TRUE,
merge = TRUE,
min_comp = 1,
max_comp = Inf,
space = 0.5,
max_rank = 1,
max_iter_smem = 10,
verbose = FALSE,
...
)
Arguments
gmm |
An initial object of class |
data |
A data frame or numeric matrix containing the data used in the
SMEM algorithm. Its columns must explicitly be named after the variables of
|
y |
A character vector containing the dependent variables if a
conditional model is estimated (which involves maximizing a conditional
score). If |
score |
A character string ( |
split |
A logical value indicating whether split operations are allowed
(if |
merge |
A logical value indicating whether merge operations are allowed
(if |
min_comp |
A positive integer corresponding to the minimum number of mixture components. |
max_comp |
A positive integer corresponding to the maximum number of mixture components. |
space |
A numeric value in [0, 1[ corresponding to the space between two subcomponents resulting from a split. |
max_rank |
A positive integer corresponding to the maximum rank for testing the split and merge candidates. |
max_iter_smem |
A non-negative integer corresponding to the maximum number of iterations. |
verbose |
A logical value indicating whether iterations in progress are displayed. |
... |
Additional arguments passed to function |
Value
A list with elements:
gmm |
The final |
posterior |
A numeric matrix containing the posterior probabilities for each observation. |
seq_score |
A numeric vector containing the sequence of scores measured initially and after each iteration. |
seq_oper |
A character vector containing the sequence of split and merge operations performed at each iteration. |
References
Zhang, Z., Chen, C., Sun, J. and Chan, K. L. (2003). EM algorithms for Gaussian mixtures with split-and-merge operation. Pattern Recognition, 36(9):1973–1983.
See Also
Examples
data(data_body)
gmm_1 <- add_var(NULL, c("WAIST", "AGE", "FAT", "HEIGHT", "WEIGHT"))
res_smem <- smem(gmm_1, data_body, max_comp = 3, verbose = TRUE)