regret {pomdp} | R Documentation |
Calculate the Regret of a Policy
Description
Calculates the regret of a policy relative to a benchmark policy.
Usage
regret(policy, benchmark, start = NULL)
Arguments
policy |
a solved POMDP containing the policy to calculate the regret for. |
benchmark |
a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy. |
start |
the used start (belief) state. If NULL then the start (belief) state of the |
Details
Regret is defined as with
representing the expected long-term
state value (represented by the value function) given the policy
and the start
state
. For POMDPs the start state is the start belief
.
Note that for regret usually the optimal policy is used as the benchmark.
Since the optimal policy may not be known, regret relative to the best known policy can be used.
Value
the regret as a difference of expected long-term rewards.
Author(s)
Michael Hahsler
See Also
Other POMDP:
MDP2POMDP
,
POMDP()
,
accessors
,
actions()
,
add_policy()
,
plot_belief_space()
,
projection()
,
reachable_and_absorbing
,
sample_belief_space()
,
simulate_POMDP()
,
solve_POMDP()
,
solve_SARSOP()
,
transition_graph()
,
update_belief()
,
value_function()
,
write_POMDP()
Other MDP:
MDP()
,
MDP2POMDP
,
MDP_policy_functions
,
accessors
,
actions()
,
add_policy()
,
gridworld
,
reachable_and_absorbing
,
simulate_MDP()
,
solve_MDP()
,
transition_graph()
,
value_function()
Examples
data(Tiger)
sol_optimal <- solve_POMDP(Tiger)
sol_optimal
# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick
regret(sol_quick, benchmark = sol_optimal)