sarsop {sarsop} | R Documentation |
sarsop
Description
sarsop wraps the tasks of writing the pomdpx file defining the problem, running the pomdsol (SARSOP) algorithm in C++, and then reading the resulting policy file back into R. The returned alpha vectors and alpha_action information is then transformed into a more generic, user-friendly representation as a matrix whose columns correspond to actions and rows to states. This function can thus be used at the heart of most pomdp applications.
Usage
sarsop(
transition,
observation,
reward,
discount,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
verbose = TRUE,
log_dir = tempdir(),
log_data = NULL,
cache = TRUE,
...
)
Arguments
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
verbose |
logical, should the function include a message with pomdp diagnostics (timings, final precision, end condition) |
log_dir |
pomdpx and policyx files will be saved here, along with a metadata file |
log_data |
a data.frame of additional columns to include in the log, such as model parameters. A unique id value for each run can be provided as one of the columns, otherwise, a globally unique id will be generated. |
cache |
should results from the log directory be cached? Default TRUE. Identical functional calls will quickly return previously cached alpha vectors from file rather than re-running. |
... |
additional arguments to |
Value
a matrix of alpha vectors. Column index indicates action associated with the alpha vector, (1:n_actions), rows indicate system state, x. Actions for which no alpha vector was found are included as all -Inf, since such actions are not optimal regardless of belief, and thus have no corresponding alpha vectors in alpha_action list.
Examples
## Takes > 5s
## Use example code to generate matrices for pomdp problem:
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
alpha <- sarsop(transition, observation, reward, discount, precision = 10)
compute_policy(alpha, transition, observation, reward)