struct_em {gmgm} | R Documentation |
Learn the structure and the parameters of a Gaussian mixture graphical model with incomplete data
Description
This function learns the structure and the parameters of a Gaussian mixture graphical model with incomplete data using the structural EM algorithm. At each iteration, the parametric EM algorithm is performed to complete the data and update the parameters (E step). The completed data are then used to update the structure (M step), and so on. Each iteration is guaranteed to increase the scoring function until convergence to a local maximum (Koller and Friedman, 2009). In practice, due to the sampling process inherent in particle-based inference, it may happen that the monotonic increase no longer occurs when approaching the local maximum, resulting in an earlier termination of the algorithm.
Usage
struct_em(
gmgm,
data,
nodes = structure(gmgm)$nodes,
arcs_cand = tibble(lag = 0),
col_seq = NULL,
score = "bic",
n_part = 1000,
max_part_sim = 1e+06,
min_ess = 1,
max_iter_sem = 5,
max_iter_pem = 5,
verbose = FALSE,
...
)
Arguments
gmgm |
An object of class |
data |
A data frame containing the data used for learning. Its columns
must explicitly be named after nodes of |
nodes |
A character vector containing the nodes whose local conditional
models are learned (by default all the nodes of |
arcs_cand |
A data frame containing the candidate arcs for addition or
removal (by default all possible non-temporal arcs). The column |
col_seq |
A character vector containing the column names of |
score |
A character string ( |
n_part |
A positive integer corresponding to the number of particles
generated for each observation (if |
max_part_sim |
An integer greater than or equal to |
min_ess |
A numeric value in [0, 1] corresponding to the minimum ESS
(expressed as a proportion of |
max_iter_sem |
A non-negative integer corresponding to the maximum number of iterations. |
max_iter_pem |
A non-negative integer corresponding to the maximum number of iterations of the parametric EM algorithm. |
verbose |
A logical value indicating whether iterations in progress are displayed. |
... |
Additional arguments passed to function |
Value
A list with elements:
gmgm |
The final |
data |
A data frame (tibble) containing the complete data used to learn
the final |
seq_score |
A numeric matrix containing the sequence of scores measured after the E and M steps of each iteration. |
References
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.
See Also
param_em
, param_learn
,
struct_learn
Examples
set.seed(0)
data(data_body)
data_1 <- data_body
data_1$GENDER[sample.int(2148, 430)] <- NA
data_1$AGE[sample.int(2148, 430)] <- NA
data_1$HEIGHT[sample.int(2148, 430)] <- NA
data_1$WEIGHT[sample.int(2148, 430)] <- NA
data_1$FAT[sample.int(2148, 430)] <- NA
data_1$WAIST[sample.int(2148, 430)] <- NA
data_1$GLYCO[sample.int(2148, 430)] <- NA
gmbn_1 <- add_nodes(NULL,
c("AGE", "FAT", "GENDER", "GLYCO", "HEIGHT", "WAIST",
"WEIGHT"))
arcs_cand_1 <- data.frame(from = c("AGE", "GENDER", "HEIGHT", "WEIGHT", NA,
"AGE", "GENDER", "AGE", "FAT", "GENDER",
"HEIGHT", "WEIGHT", "AGE", "GENDER",
"HEIGHT"),
to = c("FAT", "FAT", "FAT", "FAT", "GLYCO", "HEIGHT",
"HEIGHT", "WAIST", "WAIST", "WAIST", "WAIST",
"WAIST", "WEIGHT", "WEIGHT", "WEIGHT"))
res_learn_1 <- struct_em(gmbn_1, data_1, arcs_cand = arcs_cand_1,
verbose = TRUE, max_comp = 3)
set.seed(0)
data(data_air)
data_2 <- data_air
data_2$NO2[sample.int(7680, 1536)] <- NA
data_2$O3[sample.int(7680, 1536)] <- NA
data_2$TEMP[sample.int(7680, 1536)] <- NA
data_2$WIND[sample.int(7680, 1536)] <- NA
gmdbn_1 <- gmdbn(b_2 = add_nodes(NULL, c("NO2", "O3", "TEMP", "WIND")),
b_13 = add_nodes(NULL, c("NO2", "O3", "TEMP", "WIND")))
arcs_cand_2 <- data.frame(from = c("NO2", "NO2", "NO2", "O3", "TEMP", "TEMP",
"WIND", "WIND"),
to = c("NO2", "O3", "O3", "O3", NA, NA, NA, NA),
lag = c(1, 0, 1, 1, 0, 1, 0, 1))
res_learn_2 <- struct_em(gmdbn_1, data_2, arcs_cand = arcs_cand_2,
col_seq = "DATE", verbose = TRUE, max_comp = 3)