gridworld {pomdp} | R Documentation |
Helper Functions for Gridworld MDPs
Description
Helper functions for gridworld MDPs to convert between state names and gridworld positions, and for visualizing policies.
Usage
gridworld_init(
dim,
action_labels = c("up", "right", "down", "left"),
unreachable_states = NULL,
absorbing_states = NULL,
labels = NULL
)
gridworld_maze_MDP(
dim,
start,
goal,
walls = NULL,
action_labels = c("up", "right", "down", "left"),
goal_reward = 1,
step_cost = 0,
restart = FALSE,
discount = 0.9,
horizon = Inf,
info = NULL,
name = NA
)
gridworld_s2rc(s)
gridworld_rc2s(rc)
gridworld_matrix(model, epoch = 1L, what = "states")
gridworld_plot_policy(
model,
epoch = 1L,
actions = "character",
states = FALSE,
labels = TRUE,
absorbing_state_action = FALSE,
main = NULL,
cex = 1,
offset = 0.5,
lines = TRUE,
...
)
gridworld_plot_transition_graph(
x,
hide_unreachable_states = TRUE,
remove.loops = TRUE,
vertex.color = "gray",
vertex.shape = "square",
vertex.size = 10,
vertex.label = NA,
edge.arrow.size = 0.3,
margin = 0.2,
main = NULL,
...
)
gridworld_animate(x, method, n, zlim = NULL, ...)
Arguments
dim |
vector of length two with the x and y extent of the gridworld. |
action_labels |
vector with four action labels that move the agent up, right, down, and left. |
unreachable_states |
a vector with state labels for unreachable states. These states will be excluded. |
absorbing_states |
a vector with state labels for absorbing states. |
labels |
logical; show state labels. |
start , goal |
labels for the start state and the goal state. |
walls |
a vector with state labels for walls. Walls will become unreachable states. |
goal_reward |
reward to transition to the goal state. |
step_cost |
cost of each action that does not lead to the goal state. |
restart |
logical; if |
discount , horizon |
MDP discount factor, and horizon. |
info |
A list with additional information. Has to contain the gridworld
dimensions as element |
name |
a string to identify the MDP problem. |
s |
a state label. |
rc |
a vector of length two with the row and column coordinate of a state in the gridworld matrix. |
model , x |
a solved gridworld MDP. |
epoch |
epoch for unconverged finite-horizon solutions. |
what |
What should be returned in the matrix. Options are:
|
actions |
how to show actions. Options are:
simple |
states |
logical; show state names. |
absorbing_state_action |
logical; show the value and the action for absorbing states. |
main |
a main title for the plot. Defaults to the name of the problem. |
cex |
expansion factor for the action. |
offset |
move the state labels out of the way (in fractions of a character width). |
lines |
logical; draw lines to separate states. |
... |
further arguments are passed on to |
hide_unreachable_states |
logical; do not show unreachable states. |
remove.loops |
logical; do not show transitions from a state back to itself. |
vertex.color , vertex.shape , vertex.size , vertex.label , edge.arrow.size |
see |
margin |
a single number specifying the margin of the plot. Can be used if the graph does not fit inside the plotting area. |
method |
a MDP solution method for |
n |
number of iterations to animate. |
zlim |
limits for visualizing the state value. |
Details
Gridworlds are implemented with state names s(row,col)
, where
row
and col
are locations in the matrix representing the gridworld.
The actions are "up"
, "right"
, "down"
, and "left"
.
gridworld_init()
initializes a new gridworld creating a matrix
of states with the given dimensions. Other action names
can be specified, but they must have the same effects in the same order
as above. Unreachable states (walls) and absorbing state can be defined.
This information can be used to build a custom gridworld MDP.
Several helper functions are provided to use states, look at the state layout, and plot policies on the gridworld.
gridworld_maze_MDP()
helps to easily define maze-like gridworld MDPs.
By default, the goal state is absorbing, but with restart = TRUE
, the
agent restarts the problem at the start state every time it reaches the goal
and receives the reward. Note that this implies that the goal state itself
becomes unreachable.
gridworld_animate()
applies algorithms from solve_MDP()
iteration
by iteration and visualized the state utilities. This helps to understand
how the algorithms work.
See Also
Other gridworld:
Cliff_walking
,
Maze
,
Windy_gridworld
Other MDP:
MDP()
,
MDP2POMDP
,
MDP_policy_functions
,
accessors
,
actions()
,
add_policy()
,
reachable_and_absorbing
,
regret()
,
simulate_MDP()
,
solve_MDP()
,
transition_graph()
,
value_function()
Examples
# Defines states, actions and a transition model for a standard gridworld
gw <- gridworld_init(dim = c(7,7),
unreachable_states = c("s(2,2)", "s(7,3)", "s(3,6)"),
absorbing_states = "s(4,4)",
labels = list("s(4,4)" = "Black Hole")
)
gw$states
gw$actions
gw$info
# display the state labels in the gridworld
gridworld_matrix(gw)
gridworld_matrix(gw, what = "label")
gridworld_matrix(gw, what = "reachable")
gridworld_matrix(gw, what = "absorbing")
# a transition function for regular moves in the gridworld is provided
gw$transition_prob("right", "s(1,1)", "s(1,2)")
gw$transition_prob("right", "s(2,1)", "s(2,2)") ### we cannot move into an unreachable state
gw$transition_prob("right", "s(2,1)", "s(2,1)") ### but the agent stays in place
# convert between state names and row/column indices
gridworld_s2rc("s(1,1)")
gridworld_rc2s(c(1,1))
# The information in gw can be used to build a custom MDP.
# We modify the standard transition function so there is a 50% chance that
# you will get sucked into the black hole from the adjacent squares.
trans_black_hole <- function(action = NA, start.state = NA, end.state = NA) {
# ignore the action next to the black hole
if (start.state %in% c("s(3,3)", "s(3,4)", "s(3,5)", "s(4,3)", "s(4,5)",
"s(5,3)", "s(5,4)", "s(5,5)")) {
if(end.state == "s(4,4)")
return(.5)
else
return(gw$transition_prob(action, start.state, end.state) * .5)
}
# use the standard gridworld movement
gw$transition_prob(action, start.state, end.state)
}
black_hole <- MDP(states = gw$states,
actions = gw$actions,
transition_prob = trans_black_hole,
reward = rbind(R_(value = +1), R_(end.state = "s(4,4)", value = -100)),
info = gw$info,
name = "Black hole"
)
black_hole
gridworld_plot_transition_graph(black_hole)
# solve the problem
sol <- solve_MDP(black_hole)
gridworld_matrix(sol, what = "values")
gridworld_plot_policy(sol)
# the optimal policy is to fly around, but avoid the black hole.
# Build a Maze: The Dyna Maze from Chapter 8 in the RL book
Dyna_maze <- gridworld_maze_MDP(
dim = c(6,9),
start = "s(3,1)",
goal = "s(1,9)",
walls = c("s(2,3)", "s(3,3)", "s(4,3)",
"s(5,6)",
"s(1,8)", "s(2,8)", "s(3,8)"),
restart = TRUE,
discount = 0.95,
name = "Dyna Maze",
)
Dyna_maze
gridworld_matrix(Dyna_maze)
gridworld_matrix(Dyna_maze, what = "labels")
gridworld_plot_transition_graph(Dyna_maze)
# Note that the problems resets if the goal state would be reached.
sol <- solve_MDP(Dyna_maze)
gridworld_matrix(sol, what = "values")
gridworld_matrix(sol, what = "actions")
gridworld_plot_policy(sol)
gridworld_plot_policy(sol, actions = "label", cex = 1, states = FALSE)
# visualize the first 3 iterations of value iteration
gridworld_animate(Dyna_maze, method = "value", n = 3)