R: Build a loss function for variational inference in STS...

sts_build_factored_variational_loss {tfprobability}

R Documentation

Build a loss function for variational inference in STS models.

Description

Variational inference searches for the distribution within some family of approximate posteriors that minimizes a divergence between the approximate posterior q(z) and true posterior p(z|observed_time_series). By converting inference to optimization, it's generally much faster than sampling-based inference algorithms such as HMC. The tradeoff is that the approximating family rarely contains the true posterior, so it may miss important aspects of posterior structure (in particular, dependence between variables) and should not be blindly trusted. Results may vary; it's generally wise to compare to HMC to evaluate whether inference quality is sufficient for your task at hand.

Usage

sts_build_factored_variational_loss(
  observed_time_series,
  model,
  init_batch_shape = list(),
  seed = NULL,
  name = NULL
)

Arguments

`observed_time_series`	`float` `tensor` of shape `⁠concat([sample_shape, model.batch_shape, [num_timesteps, 1]])⁠` where `sample_shape` corresponds to i.i.d. observations, and the trailing `⁠[1]⁠` dimension may (optionally) be omitted if `num_timesteps > 1`. May optionally be an instance of `sts_masked_time_series`, which includes a mask `tensor` to specify timesteps with missing observations.
`model`	An instance of `StructuralTimeSeries` representing a time-series model. This represents a joint distribution over time-series and their parameters with batch shape `⁠[b1, ..., bN]⁠`.
`init_batch_shape`	Batch shape (`list`) of initial states to optimize in parallel. Default value: `list()`. (i.e., just run a single optimization).
`seed`	integer to seed the random number generator.
`name`	name prefixed to ops created by this function. Default value: `NULL` (i.e., 'build_factored_variational_loss').

Details

This method constructs a loss function for variational inference using the Kullback-Liebler divergence KL[q(z) || p(z|observed_time_series)], with an approximating family given by independent Normal distributions transformed to the appropriate parameter space for each parameter. Minimizing this loss (the negative ELBO) maximizes a lower bound on the log model evidence ⁠-log p(observed_time_series)⁠. This is equivalent to the 'mean-field' method implemented in Kucukelbir et al. (2017) and is a standard approach. The resulting posterior approximations are unimodal; they will tend to underestimate posterior uncertainty when the true posterior contains multiple modes (the KL[q||p] divergence encourages choosing a single mode) or dependence between variables.

Value

list of:

variational_loss: float Tensor of shape ⁠tf$concat([init_batch_shape, model$batch_shape])⁠, encoding a stochastic estimate of an upper bound on the negative model evidence ⁠-log p(y)⁠. Minimizing this loss performs variational inference; the gap between the variational bound and the true (generally unknown) model evidence corresponds to the divergence KL[q||p] between the approximate and true posterior.
variational_distributions: a named list giving the approximate posterior for each model parameter. The keys are character parameter names in order, corresponding to ⁠[param.name for param in model.parameters]⁠. The values are tfd$Distribution instances with batch shape ⁠tf$concat([init_batch_shape, model$batch_shape])⁠; these will typically be of the form tfd$TransformedDistribution(tfd.Normal(...), bijector=param.bijector).

References

Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M. Blei. Automatic Differentiation Variational Inference. In Journal of Machine Learning Research, 2017.