regret {pomdp}R Documentation

Calculate the Regret of a Policy

Description

Calculates the regret of a policy relative to a benchmark policy.

Usage

regret(policy, benchmark, start = NULL)

Arguments

policy

a solved POMDP containing the policy to calculate the regret for.

benchmark

a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy.

start

the used start (belief) state. If NULL then the start (belief) state of the benchmark is used.

Details

Regret is defined as V^{\pi^*}(s_0) - V^{\pi}(s_0) with V^\pi representing the expected long-term state value (represented by the value function) given the policy \pi and the start state s_0. For POMDPs the start state is the start belief b_0.

Note that for regret usually the optimal policy \pi^* is used as the benchmark. Since the optimal policy may not be known, regret relative to the best known policy can be used.

Value

the regret as a difference of expected long-term rewards.

Author(s)

Michael Hahsler

See Also

Other POMDP: MDP2POMDP, POMDP(), accessors, actions(), add_policy(), plot_belief_space(), projection(), reachable_and_absorbing, sample_belief_space(), simulate_POMDP(), solve_POMDP(), solve_SARSOP(), transition_graph(), update_belief(), value_function(), write_POMDP()

Other MDP: MDP(), MDP2POMDP, MDP_policy_functions, accessors, actions(), add_policy(), gridworld, reachable_and_absorbing, simulate_MDP(), solve_MDP(), transition_graph(), value_function()

Examples

data(Tiger)

sol_optimal <- solve_POMDP(Tiger)
sol_optimal

# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick

regret(sol_quick, benchmark = sol_optimal)

[Package pomdp version 1.2.3 Index]