struct_learn {gmgm}R Documentation

Learn the structure and the parameters of a Gaussian mixture graphical model

Description

This function learns the (graphical and mixture) structure and the parameters of a Gaussian mixture graphical model. Using the local decomposability of the scoring function, this task consists in learning each local conditional model independently with the stepwise algorithm (Koller and Friedman, 2009). Note that some candidate arcs may be discarded to avoid that the global graphical structure contains cycles. To limit recourse to this action, the learning process is performed sequentially. The more arcs of a local model are likely to be part of cycles (considering the worst case where all the candidate arcs are selected), the later this local model is processed. By gradually taking into account the local structures learned over time, the number of possible cycles decreases and, with it, the number of candidate arcs to discard.

Usage

struct_learn(
  gmgm,
  data,
  nodes = structure(gmgm)$nodes,
  arcs_cand = tibble(lag = 0),
  col_seq = NULL,
  score = "bic",
  verbose = FALSE,
  ...
)

Arguments

gmgm

An initial object of class gmbn or gmdbn.

data

A data frame containing the data used for learning. Its columns must explicitly be named after the nodes of gmgm and must not contain missing values.

nodes

A character vector containing the nodes whose local conditional models are learned (by default all the nodes of gmgm). If gmgm is a gmdbn object, the same nodes are learned for each of its gmbn elements. This constraint can be overcome by passing a list of character vectors named after some of these elements (b_1, ...) and containing learned nodes specific to them.

arcs_cand

A data frame containing the candidate arcs for addition or removal (by default all possible non-temporal arcs). The column from describes the start node, the column to the end node and the column lag the time lag between them. Missing values in from or to are interpreted as "all possible nodes", which allows to quickly define large set of arcs that share common attributes. Missing values in lag are replaced by 0. If gmgm is a gmdbn object, the same candidate arcs are used for each of its gmbn elements. This constraint can be overcome by passing a list of data frames named after some of these elements (b_1, ...) and containing candidate arcs specific to them. If arcs already in gmgm are not candidates, they cannot be removed. Therefore, setting arcs_cand to NULL is equivalent to learning only the mixture structure (and the parameters) of the model.

col_seq

A character vector containing the column names of data that describe the observation sequence. If NULL (the default), all the observations belong to a single sequence. If gmgm is a temporal gmbn or gmdbn object, the observations of a same sequence must be ordered such that the tth one is related to time slice t (note that the sequences can have different lengths). If gmgm is a non-temporal gmbn object, this argument is ignored.

score

A character string ("aic", "bic" or "loglik") corresponding to the scoring function.

verbose

A logical value indicating whether learned nodes in progress are displayed.

...

Additional arguments passed to function stepwise.

Value

A list with elements:

gmgm

The final gmbn or gmdbn object.

evol_score

A list with elements:

global

A numeric vector containing the global score before and after learning.

local

For a gmbn object, a numeric matrix containing the local conditional scores before and after learning. For a gmdbn object, a list of numeric matrices containing these values for each gmbn element.

References

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

See Also

param_em, param_learn, struct_em

Examples


data(data_body)
gmbn_1 <- add_nodes(NULL,
                    c("AGE", "FAT", "GENDER", "GLYCO", "HEIGHT", "WAIST",
                      "WEIGHT"))
arcs_cand_1 <- data.frame(from = c("AGE", "GENDER", "HEIGHT", "WEIGHT", NA,
                                   "AGE", "GENDER", "AGE", "FAT", "GENDER",
                                   "HEIGHT", "WEIGHT", "AGE", "GENDER",
                                   "HEIGHT"),
                          to = c("FAT", "FAT", "FAT", "FAT", "GLYCO", "HEIGHT",
                                 "HEIGHT", "WAIST", "WAIST", "WAIST", "WAIST",
                                 "WAIST", "WEIGHT", "WEIGHT", "WEIGHT"))
res_learn_1 <- struct_learn(gmbn_1, data_body, arcs_cand = arcs_cand_1,
                            verbose = TRUE, max_comp = 3)

data(data_air)
gmdbn_1 <- gmdbn(b_2 = add_nodes(NULL, c("NO2", "O3", "TEMP", "WIND")),
                 b_13 = add_nodes(NULL, c("NO2", "O3", "TEMP", "WIND")))
arcs_cand_2 <- data.frame(from = c("NO2", "NO2", "NO2", "O3", "TEMP", "TEMP",
                                   "WIND", "WIND"),
                          to = c("NO2", "O3", "O3", "O3", NA, NA, NA, NA),
                          lag = c(1, 0, 1, 1, 0, 1, 0, 1))
res_learn_2 <- struct_learn(gmdbn_1, data_air, arcs_cand = arcs_cand_2,
                            col_seq = "DATE", verbose = TRUE, max_comp = 3)


[Package gmgm version 1.1.2 Index]