sim_pomdp {sarsop}R Documentation

simulate a POMDP

Description

Simulate a POMDP given the appropriate matrices.

Usage

sim_pomdp(
  transition,
  observation,
  reward,
  discount,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  x0,
  a0 = 1,
  Tmax = 20,
  policy = NULL,
  alpha = NULL,
  reps = 1,
  ...
)

Arguments

transition

Transition matrix, dimension n_s x n_s x n_a

observation

Observation matrix, dimension n_s x n_z x n_a

reward

reward matrix, dimension n_s x n_a

discount

the discount factor

state_prior

initial belief state, optional, defaults to uniform over states

x0

initial state

a0

initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken)

Tmax

duration of simulation

policy

Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP

alpha

the matrix of alpha vectors returned by sarsop

reps

number of replicate simulations to compute

...

additional arguments to mclapply

Details

simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].

Value

a data frame with columns for time, state, obs, action, and (discounted) value.

Examples

m <- fisheries_matrices()
discount <- 0.95
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10)
sim <- sim_pomdp(m$transition, m$observation, m$reward, discount,
                 x0 = 5, Tmax = 20, alpha = alpha)

}


[Package sarsop version 0.6.15 Index]