R: compute_policy

compute_policy {sarsop}

R Documentation

compute_policy

Description

Derive the corresponding policy function from the alpha vectors

Usage

compute_policy(
  alpha,
  transition,
  observation,
  reward,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  a_0 = 1
)

Arguments

`alpha`	the matrix of alpha vectors returned by `sarsop`
`transition`	Transition matrix, dimension n_s x n_s x n_a
`observation`	Observation matrix, dimension n_s x n_z x n_a
`reward`	reward matrix, dimension n_s x n_a
`state_prior`	initial belief state, optional, defaults to uniform over states
`a_0`	previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken.

Value

a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state

Examples


m <- fisheries_matrices()
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
compute_policy(alpha, m$transition, m$observation, m$reward)
}

[Package sarsop version 0.6.15 Index]