regret {pomdp} | R Documentation |
Calculate the Regret of a Policy
Description
Calculates the regret of a policy relative to a benchmark policy.
Usage
regret(policy, benchmark, start = NULL)
Arguments
policy |
a solved POMDP containing the policy to calculate the regret for. |
benchmark |
a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy. |
start |
the used start (belief) state. If NULL then the start (belief) state of the |
Details
Regret is defined as V^{\pi^*}(s_0) - V^{\pi}(s_0)
with V^\pi
representing the expected long-term
state value (represented by the value function) given the policy \pi
and the start
state s_0
. For POMDPs the start state is the start belief b_0
.
Note that for regret usually the optimal policy \pi^*
is used as the benchmark.
Since the optimal policy may not be known, regret relative to the best known policy can be used.
Value
the regret as a difference of expected long-term rewards.
Author(s)
Michael Hahsler
See Also
Other POMDP:
MDP2POMDP
,
POMDP()
,
accessors
,
actions()
,
add_policy()
,
plot_belief_space()
,
projection()
,
reachable_and_absorbing
,
sample_belief_space()
,
simulate_POMDP()
,
solve_POMDP()
,
solve_SARSOP()
,
transition_graph()
,
update_belief()
,
value_function()
,
write_POMDP()
Other MDP:
MDP()
,
MDP2POMDP
,
MDP_policy_functions
,
accessors
,
actions()
,
add_policy()
,
gridworld
,
reachable_and_absorbing
,
simulate_MDP()
,
solve_MDP()
,
transition_graph()
,
value_function()
Examples
data(Tiger)
sol_optimal <- solve_POMDP(Tiger)
sol_optimal
# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick
regret(sol_quick, benchmark = sol_optimal)