param_em {gmgm}R Documentation

Learn the parameters of a Gaussian mixture graphical model with incomplete data


This function learns the parameters of a Gaussian mixture graphical model with incomplete data using the parametric EM algorithm. At each iteration, inference (smoothing inference for a dynamic Bayesian network) is performed to complete the data given the current estimate of the parameters (E step). The completed data are then used to update the parameters (M step), and so on. Each iteration is guaranteed to increase the log-likelihood until convergence to a local maximum (Koller and Friedman, 2009). In practice, due to the sampling process inherent in particle-based inference, it may happen that the monotonic increase no longer occurs when approaching the local maximum, resulting in an earlier termination of the algorithm.


  nodes = structure(gmgm)$nodes,
  col_seq = NULL,
  n_part = 1000,
  max_part_sim = 1e+06,
  min_ess = 1,
  max_iter_pem = 5,
  verbose = FALSE,



An object of class gmbn (non-temporal) or gmdbn.


A data frame containing the data used for learning. Its columns must explicitly be named after nodes of gmgm and can contain missing values (columns with no value can be removed).


A character vector containing the nodes whose local conditional models are learned (by default all the nodes of gmgm). If gmgm is a gmdbn object, the same nodes are learned for each of its gmbn elements. This constraint can be overcome by passing a list of character vectors named after some of these elements (b_1, ...) and containing learned nodes specific to them.


A character vector containing the column names of data that describe the observation sequence. If NULL (the default), all the observations belong to a single sequence. If gmgm is a gmdbn object, the observations of a same sequence must be ordered such that the ttth one is related to time slice tt (note that the sequences can have different lengths). If gmgm is a gmbn object, this argument is ignored.


A positive integer corresponding to the number of particles generated for each observation (if gmgm is a gmbn object) or observation sequence (if gmgm is a gmdbn object) during inference.


An integer greater than or equal to n_part corresponding to the maximum number of particles that can be processed simultaneously during inference. This argument is used to prevent memory overflow, dividing data into smaller subsets that are handle sequentially.


A numeric value in [0, 1] corresponding to the minimum ESS (expressed as a proportion of n_part) under which the renewal step of sequential importance resampling is performed. If 1 (the default), this step is performed at each time slice. If gmgm is a gmbn object, this argument is ignored.


A non-negative integer corresponding to the maximum number of iterations.


A logical value indicating whether iterations in progress are displayed.


Additional arguments passed to function em.


A list with elements:


The final gmbn or gmdbn object (with the highest log-likelihood).


A data frame (tibble) containing the complete data used to learn the final gmbn or gmdbn object.


A numeric matrix containing the sequence of log-likelihoods measured after the E and M steps of each iteration.


Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

See Also

param_learn, struct_em, struct_learn


data_1 <- data_body
data_1$GENDER[, 430)] <- NA
data_1$AGE[, 430)] <- NA
data_1$HEIGHT[, 430)] <- NA
data_1$WEIGHT[, 430)] <- NA
data_1$FAT[, 430)] <- NA
data_1$WAIST[, 430)] <- NA
data_1$GLYCO[, 430)] <- NA
gmbn_1 <- gmbn(
  AGE = split_comp(add_var(NULL, data_1[, "AGE"]), n_sub = 3),
  FAT = split_comp(add_var(NULL,
                           data_1[, c("FAT", "GENDER", "HEIGHT", "WEIGHT")]),
                   n_sub = 2),
  GENDER = split_comp(add_var(NULL, data_1[, "GENDER"]), n_sub = 2),
  GLYCO = split_comp(add_var(NULL, data_1[, c("GLYCO", "AGE", "WAIST")]),
                     n_sub = 2),
  HEIGHT = split_comp(add_var(NULL, data_1[, c("HEIGHT", "GENDER")])),
  WAIST = split_comp(add_var(NULL,
                             data_1[, c("WAIST", "AGE", "FAT", "HEIGHT",
                     n_sub = 3),
  WEIGHT = split_comp(add_var(NULL, data_1[, c("WEIGHT", "HEIGHT")]), n_sub = 2)
res_learn_1 <- param_em(gmbn_1, data_1, verbose = TRUE)

data_2 <- data_air
data_2$NO2[, 1536)] <- NA
data_2$O3[, 1536)] <- NA
data_2$TEMP[, 1536)] <- NA
data_2$WIND[, 1536)] <- NA
data_3 <- data_2 %>%
  group_by(DATE) %>%
  mutate(NO2.1 = lag(NO2), O3.1 = lag(O3), TEMP.1 = lag(TEMP),
         WIND.1 = lag(WIND)) %>%
gmdbn_1 <- gmdbn(
  b_2 = gmbn(
    NO2 = split_comp(add_var(NULL, data_3[, c("NO2", "NO2.1", "WIND")]),
                     n_sub = 3),
    O3 = split_comp(add_var(NULL,
                            data_3[, c("O3", "NO2", "NO2.1", "O3.1", "TEMP",
                    n_sub = 3),
    TEMP = split_comp(add_var(NULL, data_3[, c("TEMP", "TEMP.1")]), n_sub = 3),
    WIND = split_comp(add_var(NULL, data_3[, c("WIND", "WIND.1")]), n_sub = 3)
  b_13 = gmbn(
    NO2 = split_comp(add_var(NULL, data_3[, c("NO2", "NO2.1", "WIND")]),
                     n_sub = 3),
    O3 = split_comp(add_var(NULL,
                            data_3[, c("O3", "O3.1", "TEMP", "TEMP.1",
                    n_sub = 3),
    TEMP = split_comp(add_var(NULL, data_3[, c("TEMP", "TEMP.1")]), n_sub = 3),
    WIND = split_comp(add_var(NULL, data_3[, c("WIND", "WIND.1")]), n_sub = 3)
res_learn_2 <- param_em(gmdbn_1, data_2, col_seq = "DATE", verbose = TRUE)

[Package gmgm version 1.1.2 Index]