R: Perform predictive inference in a Gaussian mixture dynamic...

prediction {gmgm}

R Documentation

Perform predictive inference in a Gaussian mixture dynamic Bayesian network

Description

This function performs predictive inference in a Gaussian mixture dynamic Bayesian network. For a sequence of T time slices, this task consists in defining a time horizon h such that at each time slice t (for 0 \le t \le T - h), the state of the system at t + h is estimated given all the data (the evidence) collected up to t. Although the states at t + 1, \dots , t + h are observed in the future, some information about them can be known a priori (such as contextual information or features controlled by the user). This "predicted" evidence can be taken into account when propagating the particles from t to t + h in order to improve the predictions. Predictive inference is performed by sequential importance resampling, which is a particle-based approximate method (Koller and Friedman, 2009).

Usage

prediction(
  gmdbn,
  evid,
  evid_pred = NULL,
  nodes = names(gmdbn$b_1),
  col_seq = NULL,
  horizon = 1,
  n_part = 1000,
  max_part_sim = 1e+06,
  min_ess = 1,
  verbose = FALSE
)

Arguments

`gmdbn`	An object of class `gmdbn`.
`evid`	A data frame containing the evidence. Its columns must explicitly be named after nodes of `gmdbn` and can contain missing values (columns with no value can be removed).
`evid_pred`	A data frame containing the "predicted" evidence. Its columns must explicitly be named after nodes of `gmdbn` and can contain missing values (columns with no value can be removed).
`nodes`	A character vector containing the inferred nodes (by default all the nodes of `gmdbn`).
`col_seq`	A character vector containing the column names of `evid` and `evid_pred` that describe the observation sequence. If `NULL` (the default), all the observations belong to a single sequence. The observations of a same sequence must be ordered such that the `t`th one is related to time slice `t` (note that the sequences can have different lengths).
`horizon`	A positive integer vector containing the time horizons for which predictive inference is performed.
`n_part`	A positive integer corresponding to the number of particles generated for each observation sequence.
`max_part_sim`	An integer greater than or equal to `n_part` corresponding to the maximum number of particles that can be processed simultaneously. This argument is used to prevent memory overflow, dividing `evid` into smaller subsets that are handled sequentially.
`min_ess`	A numeric value in [0, 1] corresponding to the minimum ESS (expressed as a proportion of `n_part`) under which the renewal step of sequential importance resampling is performed. If `1` (the default), this step is performed at each time slice.
`verbose`	A logical value indicating whether subsets of `evid` and time slices in progress are displayed.

Value

If horizon has one element, a data frame with a structure similar to evid containing the predicted values of the inferred nodes and their observation sequences (if col_seq is not NULL). If horizon has two or more elements, a list of data frames (tibbles) containing these values for each time horizon.

References

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

Examples


set.seed(0)
data(gmdbn_air, data_air)
evid <- data_air
evid$NO2[sample.int(7680, 1536)] <- NA
evid$O3[sample.int(7680, 1536)] <- NA
pred <- prediction(gmdbn_air, evid, evid[, c("DATE", "TEMP", "WIND")],
                   nodes = c("NO2", "O3"), col_seq = "DATE",
                   horizon = c(1, 2), verbose = TRUE)

[Package gmgm version 1.1.2 Index]