param_em {gmgm}R Documentation

Learn the parameters of a Gaussian mixture graphical model with incomplete data

Description

This function learns the parameters of a Gaussian mixture graphical model with incomplete data using the parametric EM algorithm. At each iteration, inference (smoothing inference for a dynamic Bayesian network) is performed to complete the data given the current estimate of the parameters (E step). The completed data are then used to update the parameters (M step), and so on. Each iteration is guaranteed to increase the log-likelihood until convergence to a local maximum (Koller and Friedman, 2009). In practice, due to the sampling process inherent in particle-based inference, it may happen that the monotonic increase no longer occurs when approaching the local maximum, resulting in an earlier termination of the algorithm.

Usage

param_em(
  gmgm,
  data,
  nodes = structure(gmgm)$nodes,
  col_seq = NULL,
  n_part = 1000,
  max_part_sim = 1e+06,
  min_ess = 1,
  max_iter_pem = 5,
  verbose = FALSE,
  ...
)

Arguments

gmgm

An object of class gmbn (non-temporal) or gmdbn.

data

A data frame containing the data used for learning. Its columns must explicitly be named after nodes of gmgm and can contain missing values (columns with no value can be removed).

nodes

A character vector containing the nodes whose local conditional models are learned (by default all the nodes of gmgm). If gmgm is a gmdbn object, the same nodes are learned for each of its gmbn elements. This constraint can be overcome by passing a list of character vectors named after some of these elements (b_1, ...) and containing learned nodes specific to them.

col_seq

A character vector containing the column names of data that describe the observation sequence. If NULL (the default), all the observations belong to a single sequence. If gmgm is a gmdbn object, the observations of a same sequence must be ordered such that the tth one is related to time slice t (note that the sequences can have different lengths). If gmgm is a gmbn object, this argument is ignored.

n_part

A positive integer corresponding to the number of particles generated for each observation (if gmgm is a gmbn object) or observation sequence (if gmgm is a gmdbn object) during inference.

max_part_sim

An integer greater than or equal to n_part corresponding to the maximum number of particles that can be processed simultaneously during inference. This argument is used to prevent memory overflow, dividing data into smaller subsets that are handle sequentially.

min_ess

A numeric value in [0, 1] corresponding to the minimum ESS (expressed as a proportion of n_part) under which the renewal step of sequential importance resampling is performed. If 1 (the default), this step is performed at each time slice. If gmgm is a gmbn object, this argument is ignored.

max_iter_pem

A non-negative integer corresponding to the maximum number of iterations.

verbose

A logical value indicating whether iterations in progress are displayed.

...

Additional arguments passed to function em.

Value

A list with elements:

gmgm

The final gmbn or gmdbn object (with the highest log-likelihood).

data

A data frame (tibble) containing the complete data used to learn the final gmbn or gmdbn object.

seq_loglik

A numeric matrix containing the sequence of log-likelihoods measured after the E and M steps of each iteration.

References

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

See Also

param_learn, struct_em, struct_learn

Examples


set.seed(0)
data(data_body)
data_1 <- data_body
data_1$GENDER[sample.int(2148, 430)] <- NA
data_1$AGE[sample.int(2148, 430)] <- NA
data_1$HEIGHT[sample.int(2148, 430)] <- NA
data_1$WEIGHT[sample.int(2148, 430)] <- NA
data_1$FAT[sample.int(2148, 430)] <- NA
data_1$WAIST[sample.int(2148, 430)] <- NA
data_1$GLYCO[sample.int(2148, 430)] <- NA
gmbn_1 <- gmbn(
  AGE = split_comp(add_var(NULL, data_1[, "AGE"]), n_sub = 3),
  FAT = split_comp(add_var(NULL,
                           data_1[, c("FAT", "GENDER", "HEIGHT", "WEIGHT")]),
                   n_sub = 2),
  GENDER = split_comp(add_var(NULL, data_1[, "GENDER"]), n_sub = 2),
  GLYCO = split_comp(add_var(NULL, data_1[, c("GLYCO", "AGE", "WAIST")]),
                     n_sub = 2),
  HEIGHT = split_comp(add_var(NULL, data_1[, c("HEIGHT", "GENDER")])),
  WAIST = split_comp(add_var(NULL,
                             data_1[, c("WAIST", "AGE", "FAT", "HEIGHT",
                                        "WEIGHT")]),
                     n_sub = 3),
  WEIGHT = split_comp(add_var(NULL, data_1[, c("WEIGHT", "HEIGHT")]), n_sub = 2)
)
res_learn_1 <- param_em(gmbn_1, data_1, verbose = TRUE)

library(dplyr)
set.seed(0)
data(data_air)
data_2 <- data_air
data_2$NO2[sample.int(7680, 1536)] <- NA
data_2$O3[sample.int(7680, 1536)] <- NA
data_2$TEMP[sample.int(7680, 1536)] <- NA
data_2$WIND[sample.int(7680, 1536)] <- NA
data_3 <- data_2 %>%
  group_by(DATE) %>%
  mutate(NO2.1 = lag(NO2), O3.1 = lag(O3), TEMP.1 = lag(TEMP),
         WIND.1 = lag(WIND)) %>%
  ungroup()
gmdbn_1 <- gmdbn(
  b_2 = gmbn(
    NO2 = split_comp(add_var(NULL, data_3[, c("NO2", "NO2.1", "WIND")]),
                     n_sub = 3),
    O3 = split_comp(add_var(NULL,
                            data_3[, c("O3", "NO2", "NO2.1", "O3.1", "TEMP",
                                       "TEMP.1")]),
                    n_sub = 3),
    TEMP = split_comp(add_var(NULL, data_3[, c("TEMP", "TEMP.1")]), n_sub = 3),
    WIND = split_comp(add_var(NULL, data_3[, c("WIND", "WIND.1")]), n_sub = 3)
  ),
  b_13 = gmbn(
    NO2 = split_comp(add_var(NULL, data_3[, c("NO2", "NO2.1", "WIND")]),
                     n_sub = 3),
    O3 = split_comp(add_var(NULL,
                            data_3[, c("O3", "O3.1", "TEMP", "TEMP.1",
                                       "WIND")]),
                    n_sub = 3),
    TEMP = split_comp(add_var(NULL, data_3[, c("TEMP", "TEMP.1")]), n_sub = 3),
    WIND = split_comp(add_var(NULL, data_3[, c("WIND", "WIND.1")]), n_sub = 3)
  )
)
res_learn_2 <- param_em(gmdbn_1, data_2, col_seq = "DATE", verbose = TRUE)


[Package gmgm version 1.1.2 Index]