policy {pomdp} | R Documentation |
Extract the Policy from a POMDP/MDP
Description
Extracts the policy from a solved POMDP/MDP.
Usage
policy(x, drop = TRUE)
Arguments
x |
|
drop |
logical; drop the list for converged, epoch-independent policies. |
Details
A list (one entry per epoch) with the optimal policy. For converged, infinite-horizon problems solutions, a list with only the converged solution is produced. For a POMDP, the policy is a data.frame consisting of:
Part 1: The alpha vectors for the belief states (defines also the utility of the belief). The columns have the names of states.
Part 2: The last column named
action
contains the prescribed action.
For an MDP, the policy is a data.frame with columns for:
-
state
: The state. -
U
: The state's value (discounted expected utility U) if the policy is followed -
action
: The prescribed action.
Value
A list with the policy for each epoch. Converged policies
have only one element. If drop = TRUE
then the policy is returned
without a list.
Author(s)
Michael Hahsler
See Also
Other policy:
estimate_belief_for_nodes()
,
optimal_action()
,
plot_belief_space()
,
plot_policy_graph()
,
policy_graph()
,
projection()
,
reward()
,
solve_POMDP()
,
solve_SARSOP()
,
value_function()
Examples
data("Tiger")
# Infinite horizon
sol <- solve_POMDP(model = Tiger)
sol
# policy with value function, optimal action and transitions for observations.
policy(sol)
plot_value_function(sol)
# Finite horizon (we use incremental pruning because grid does not converge)
sol <- solve_POMDP(model = Tiger, method = "incprune",
horizon = 3, discount = 1)
sol
policy(sol)
# Note: We see that it is initially better to listen till we make
# a decision in the final epoch.
# MDP policy
data(Maze)
sol <- solve_MDP(Maze)
policy(sol)