param_em {gmgm} | R Documentation |
Learn the parameters of a Gaussian mixture graphical model with incomplete data
Description
This function learns the parameters of a Gaussian mixture graphical model with incomplete data using the parametric EM algorithm. At each iteration, inference (smoothing inference for a dynamic Bayesian network) is performed to complete the data given the current estimate of the parameters (E step). The completed data are then used to update the parameters (M step), and so on. Each iteration is guaranteed to increase the log-likelihood until convergence to a local maximum (Koller and Friedman, 2009). In practice, due to the sampling process inherent in particle-based inference, it may happen that the monotonic increase no longer occurs when approaching the local maximum, resulting in an earlier termination of the algorithm.
Usage
param_em(
gmgm,
data,
nodes = structure(gmgm)$nodes,
col_seq = NULL,
n_part = 1000,
max_part_sim = 1e+06,
min_ess = 1,
max_iter_pem = 5,
verbose = FALSE,
...
)
Arguments
gmgm |
An object of class |
data |
A data frame containing the data used for learning. Its columns
must explicitly be named after nodes of |
nodes |
A character vector containing the nodes whose local conditional
models are learned (by default all the nodes of |
col_seq |
A character vector containing the column names of |
n_part |
A positive integer corresponding to the number of particles
generated for each observation (if |
max_part_sim |
An integer greater than or equal to |
min_ess |
A numeric value in [0, 1] corresponding to the minimum ESS
(expressed as a proportion of |
max_iter_pem |
A non-negative integer corresponding to the maximum number of iterations. |
verbose |
A logical value indicating whether iterations in progress are displayed. |
... |
Additional arguments passed to function |
Value
A list with elements:
gmgm |
The final |
data |
A data frame (tibble) containing the complete data used to learn
the final |
seq_loglik |
A numeric matrix containing the sequence of log-likelihoods measured after the E and M steps of each iteration. |
References
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.
See Also
param_learn
, struct_em
,
struct_learn
Examples
set.seed(0)
data(data_body)
data_1 <- data_body
data_1$GENDER[sample.int(2148, 430)] <- NA
data_1$AGE[sample.int(2148, 430)] <- NA
data_1$HEIGHT[sample.int(2148, 430)] <- NA
data_1$WEIGHT[sample.int(2148, 430)] <- NA
data_1$FAT[sample.int(2148, 430)] <- NA
data_1$WAIST[sample.int(2148, 430)] <- NA
data_1$GLYCO[sample.int(2148, 430)] <- NA
gmbn_1 <- gmbn(
AGE = split_comp(add_var(NULL, data_1[, "AGE"]), n_sub = 3),
FAT = split_comp(add_var(NULL,
data_1[, c("FAT", "GENDER", "HEIGHT", "WEIGHT")]),
n_sub = 2),
GENDER = split_comp(add_var(NULL, data_1[, "GENDER"]), n_sub = 2),
GLYCO = split_comp(add_var(NULL, data_1[, c("GLYCO", "AGE", "WAIST")]),
n_sub = 2),
HEIGHT = split_comp(add_var(NULL, data_1[, c("HEIGHT", "GENDER")])),
WAIST = split_comp(add_var(NULL,
data_1[, c("WAIST", "AGE", "FAT", "HEIGHT",
"WEIGHT")]),
n_sub = 3),
WEIGHT = split_comp(add_var(NULL, data_1[, c("WEIGHT", "HEIGHT")]), n_sub = 2)
)
res_learn_1 <- param_em(gmbn_1, data_1, verbose = TRUE)
library(dplyr)
set.seed(0)
data(data_air)
data_2 <- data_air
data_2$NO2[sample.int(7680, 1536)] <- NA
data_2$O3[sample.int(7680, 1536)] <- NA
data_2$TEMP[sample.int(7680, 1536)] <- NA
data_2$WIND[sample.int(7680, 1536)] <- NA
data_3 <- data_2 %>%
group_by(DATE) %>%
mutate(NO2.1 = lag(NO2), O3.1 = lag(O3), TEMP.1 = lag(TEMP),
WIND.1 = lag(WIND)) %>%
ungroup()
gmdbn_1 <- gmdbn(
b_2 = gmbn(
NO2 = split_comp(add_var(NULL, data_3[, c("NO2", "NO2.1", "WIND")]),
n_sub = 3),
O3 = split_comp(add_var(NULL,
data_3[, c("O3", "NO2", "NO2.1", "O3.1", "TEMP",
"TEMP.1")]),
n_sub = 3),
TEMP = split_comp(add_var(NULL, data_3[, c("TEMP", "TEMP.1")]), n_sub = 3),
WIND = split_comp(add_var(NULL, data_3[, c("WIND", "WIND.1")]), n_sub = 3)
),
b_13 = gmbn(
NO2 = split_comp(add_var(NULL, data_3[, c("NO2", "NO2.1", "WIND")]),
n_sub = 3),
O3 = split_comp(add_var(NULL,
data_3[, c("O3", "O3.1", "TEMP", "TEMP.1",
"WIND")]),
n_sub = 3),
TEMP = split_comp(add_var(NULL, data_3[, c("TEMP", "TEMP.1")]), n_sub = 3),
WIND = split_comp(add_var(NULL, data_3[, c("WIND", "WIND.1")]), n_sub = 3)
)
)
res_learn_2 <- param_em(gmdbn_1, data_2, col_seq = "DATE", verbose = TRUE)