R: Estimate the Belief for Policy Graph Nodes

estimate_belief_for_nodes {pomdp}

R Documentation

Estimate the Belief for Policy Graph Nodes

Description

Estimate a belief for each alpha vector (segment of the value function) which represents a node in the policy graph.

Usage

estimate_belief_for_nodes(
  x,
  method = "auto",
  belief = NULL,
  verbose = FALSE,
  ...
)

Arguments

`x`	object of class POMDP containing a solved and converged POMDP problem.
`method`	character string specifying the estimation method. Methods include `"auto"`, reuse `"solver_points"`, follow `"trajectories"`, sample `"random_sample"` or `"regular_sample"`. Auto uses solver points if available and follows trajectories otherwise.
`belief`	start belief used for method trajectories. `NULL` uses the start belief specified in the model.
`verbose`	logical; show which method is used.
`...`	parameters are passed on to `sample_belief_space()` or the code that follows trajectories.

Details

estimate_belief_for_nodes() can estimate the belief in several ways:

Use belief points explored by the solver. Some solvers return explored belief points. We assign the belief points to the nodes and average each nodes belief.
Follow trajectories (breadth first) till all policy graph nodes have been visited and return the encountered belief. This implementation returns the first (i.e., shallowest) belief point that is encountered is used and no averaging is performed. parameter n can be used to limit the number of nodes searched.
Sample a large set of possible belief points, assigning them to the nodes and then averaging the belief over the points assigned to each node. This will return a central belief for the node. Additional parameters like method and the sample size n are passed on to sample_belief_space(). If no belief point is generated for a segment, then a warning is produced. In this case, the number of sampled points can be increased.

Notes:

Each method may return a different answer. The only thing that is guaranteed is that the returned belief falls in the range where the value function segment is maximal.
If some nodes not belief points are sampled, or the node is not reachable from the initial belief, then a vector with all NaNs will be returned with a warning.

Value

returns a list with matrices with a belief for each policy graph node. The list elements are the epochs and converged solutions only have a single element.

Examples

data("Tiger")

# Infinite horizon case with converged solution
sol <- solve_POMDP(model = Tiger, method = "grid")
sol

# default method auto uses the belief points used in the algorithm (if available).
estimate_belief_for_nodes(sol, verbose = TRUE)

# use belief points obtained from trajectories  
estimate_belief_for_nodes(sol, method = "trajectories", verbose = TRUE)

# use a random uniform sample 
estimate_belief_for_nodes(sol, method = "random", verbose = TRUE)

# Finite horizon example with three epochs. 
sol <- solve_POMDP(model = Tiger, horizon = 3)
sol
estimate_belief_for_nodes(sol)

[Package pomdp version 1.2.3 Index]