sim_pomdp {sarsop} | R Documentation |
simulate a POMDP
Description
Simulate a POMDP given the appropriate matrices.
Usage
sim_pomdp(
transition,
observation,
reward,
discount,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
x0,
a0 = 1,
Tmax = 20,
policy = NULL,
alpha = NULL,
reps = 1,
...
)
Arguments
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
x0 |
initial state |
a0 |
initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken) |
Tmax |
duration of simulation |
policy |
Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP |
alpha |
the matrix of alpha vectors returned by |
reps |
number of replicate simulations to compute |
... |
additional arguments to mclapply |
Details
simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].
Value
a data frame with columns for time, state, obs, action, and (discounted) value.
Examples
m <- fisheries_matrices()
discount <- 0.95
## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10)
sim <- sim_pomdp(m$transition, m$observation, m$reward, discount,
x0 = 5, Tmax = 20, alpha = alpha)
}