| regret {pomdp} | R Documentation |
Calculate the Regret of a Policy
Description
Calculates the regret of a policy relative to a benchmark policy.
Usage
regret(policy, benchmark, start = NULL)
Arguments
policy |
a solved POMDP containing the policy to calculate the regret for. |
benchmark |
a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy. |
start |
the used start (belief) state. If NULL then the start (belief) state of the |
Details
Regret is defined as V^{\pi^*}(s_0) - V^{\pi}(s_0) with V^\pi representing the expected long-term
state value (represented by the value function) given the policy \pi and the start
state s_0. For POMDPs the start state is the start belief b_0.
Note that for regret usually the optimal policy \pi^* is used as the benchmark.
Since the optimal policy may not be known, regret relative to the best known policy can be used.
Value
the regret as a difference of expected long-term rewards.
Author(s)
Michael Hahsler
See Also
Other POMDP:
MDP2POMDP,
POMDP(),
accessors,
actions(),
add_policy(),
plot_belief_space(),
projection(),
reachable_and_absorbing,
sample_belief_space(),
simulate_POMDP(),
solve_POMDP(),
solve_SARSOP(),
transition_graph(),
update_belief(),
value_function(),
write_POMDP()
Other MDP:
MDP(),
MDP2POMDP,
MDP_policy_functions,
accessors,
actions(),
add_policy(),
gridworld,
reachable_and_absorbing,
simulate_MDP(),
solve_MDP(),
transition_graph(),
value_function()
Examples
data(Tiger)
sol_optimal <- solve_POMDP(Tiger)
sol_optimal
# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick
regret(sol_quick, benchmark = sol_optimal)