estimate_belief_for_nodes {pomdp} | R Documentation |
Estimate the Belief for Policy Graph Nodes
Description
Estimate a belief for each alpha vector (segment of the value function) which represents a node in the policy graph.
Usage
estimate_belief_for_nodes(
x,
method = "auto",
belief = NULL,
verbose = FALSE,
...
)
Arguments
x |
object of class POMDP containing a solved and converged POMDP problem. |
method |
character string specifying the estimation method. Methods include
|
belief |
start belief used for method trajectories. |
verbose |
logical; show which method is used. |
... |
parameters are passed on to |
Details
estimate_belief_for_nodes()
can estimate the belief in several ways:
-
Use belief points explored by the solver. Some solvers return explored belief points. We assign the belief points to the nodes and average each nodes belief.
-
Follow trajectories (breadth first) till all policy graph nodes have been visited and return the encountered belief. This implementation returns the first (i.e., shallowest) belief point that is encountered is used and no averaging is performed. parameter
n
can be used to limit the number of nodes searched. -
Sample a large set of possible belief points, assigning them to the nodes and then averaging the belief over the points assigned to each node. This will return a central belief for the node. Additional parameters like
method
and the sample sizen
are passed on tosample_belief_space()
. If no belief point is generated for a segment, then a warning is produced. In this case, the number of sampled points can be increased.
Notes:
Each method may return a different answer. The only thing that is guaranteed is that the returned belief falls in the range where the value function segment is maximal.
If some nodes not belief points are sampled, or the node is not reachable from the initial belief, then a vector with all
NaN
s will be returned with a warning.
Value
returns a list with matrices with a belief for each policy graph node. The list elements are the epochs and converged solutions only have a single element.
See Also
Other policy:
optimal_action()
,
plot_belief_space()
,
plot_policy_graph()
,
policy()
,
policy_graph()
,
projection()
,
reward()
,
solve_POMDP()
,
solve_SARSOP()
,
value_function()
Examples
data("Tiger")
# Infinite horizon case with converged solution
sol <- solve_POMDP(model = Tiger, method = "grid")
sol
# default method auto uses the belief points used in the algorithm (if available).
estimate_belief_for_nodes(sol, verbose = TRUE)
# use belief points obtained from trajectories
estimate_belief_for_nodes(sol, method = "trajectories", verbose = TRUE)
# use a random uniform sample
estimate_belief_for_nodes(sol, method = "random", verbose = TRUE)
# Finite horizon example with three epochs.
sol <- solve_POMDP(model = Tiger, horizon = 3)
sol
estimate_belief_for_nodes(sol)