R: Perform filtering inference in a Gaussian mixture dynamic...

filtering {gmgm}

R Documentation

Perform filtering inference in a Gaussian mixture dynamic Bayesian network

Description

This function performs filtering inference in a Gaussian mixture dynamic Bayesian network. For a sequence of T time slices, this task consists in estimating the state of the system at each time slice t (for 1 \le t \le T) given all the data (the evidence) collected up to t. This function is also designed to perform fixed-lag smoothing inference, which consists in defining a time lag l such that at each time slice t (for l + 1 \le t \le T), the state at t - l is estimated given the evidence collected up to t (Murphy, 2002). Filtering and fixed-lag smoothing inference are performed by sequential importance resampling, which is a particle-based approximate method (Koller and Friedman, 2009).

Usage

filtering(
  gmdbn,
  evid,
  nodes = names(gmdbn$b_1),
  col_seq = NULL,
  lag = 0,
  n_part = 1000,
  max_part_sim = 1e+06,
  min_ess = 1,
  verbose = FALSE
)

Arguments

`gmdbn`	An object of class `gmdbn`.
`evid`	A data frame containing the evidence. Its columns must explicitly be named after nodes of `gmdbn` and can contain missing values (columns with no value can be removed).
`nodes`	A character vector containing the inferred nodes (by default all the nodes of `gmdbn`).
`col_seq`	A character vector containing the column names of `evid` that describe the observation sequence. If `NULL` (the default), all the observations belong to a single sequence. The observations of a same sequence must be ordered such that the `t`th one is related to time slice `t` (note that the sequences can have different lengths).
`lag`	A non-negative integer vector containing the time lags for which fixed-lag smoothing inference is performed. If `0` (the default), filtering inference is performed.
`n_part`	A positive integer corresponding to the number of particles generated for each observation sequence.
`max_part_sim`	An integer greater than or equal to `n_part` corresponding to the maximum number of particles that can be processed simultaneously. This argument is used to prevent memory overflow, dividing `evid` into smaller subsets that are handled sequentially.
`min_ess`	A numeric value in [0, 1] corresponding to the minimum ESS (expressed as a proportion of `n_part`) under which the renewal step of sequential importance resampling is performed. If `1` (the default), this step is performed at each time slice.
`verbose`	A logical value indicating whether subsets of `evid` and time slices in progress are displayed.

Value

If lag has one element, a data frame (tibble) with a structure similar to evid containing the estimated values of the inferred nodes and their observation sequences (if col_seq is not NULL). If lag has two or more elements, a list of data frames (tibbles) containing these values for each time lag.

References

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.

Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California.

Examples


set.seed(0)
data(gmdbn_air, data_air)
evid <- data_air
evid$NO2[sample.int(7680, 1536)] <- NA
evid$O3[sample.int(7680, 1536)] <- NA
evid$TEMP[sample.int(7680, 1536)] <- NA
evid$WIND[sample.int(7680, 1536)] <- NA
filt <- filtering(gmdbn_air, evid, col_seq = "DATE", lag = c(0, 1),
                  verbose = TRUE)

[Package gmgm version 1.1.2 Index]