gridworld {pomdp}R Documentation

Helper Functions for Gridworld MDPs

Description

Helper functions for gridworld MDPs to convert between state names and gridworld positions, and for visualizing policies.

Usage

gridworld_init(
  dim,
  action_labels = c("up", "right", "down", "left"),
  unreachable_states = NULL,
  absorbing_states = NULL,
  labels = NULL
)

gridworld_maze_MDP(
  dim,
  start,
  goal,
  walls = NULL,
  action_labels = c("up", "right", "down", "left"),
  goal_reward = 1,
  step_cost = 0,
  restart = FALSE,
  discount = 0.9,
  horizon = Inf,
  info = NULL,
  name = NA
)

gridworld_s2rc(s)

gridworld_rc2s(rc)

gridworld_matrix(model, epoch = 1L, what = "states")

gridworld_plot_policy(
  model,
  epoch = 1L,
  actions = "character",
  states = FALSE,
  labels = TRUE,
  absorbing_state_action = FALSE,
  main = NULL,
  cex = 1,
  offset = 0.5,
  lines = TRUE,
  ...
)

gridworld_plot_transition_graph(
  x,
  hide_unreachable_states = TRUE,
  remove.loops = TRUE,
  vertex.color = "gray",
  vertex.shape = "square",
  vertex.size = 10,
  vertex.label = NA,
  edge.arrow.size = 0.3,
  margin = 0.2,
  main = NULL,
  ...
)

gridworld_animate(x, method, n, zlim = NULL, ...)

Arguments

dim

vector of length two with the x and y extent of the gridworld.

action_labels

vector with four action labels that move the agent up, right, down, and left.

unreachable_states

a vector with state labels for unreachable states. These states will be excluded.

absorbing_states

a vector with state labels for absorbing states.

labels

logical; show state labels.

start, goal

labels for the start state and the goal state.

walls

a vector with state labels for walls. Walls will become unreachable states.

goal_reward

reward to transition to the goal state.

step_cost

cost of each action that does not lead to the goal state.

restart

logical; if TRUE then the problem automatically restarts when the agent reaches the goal state.

discount, horizon

MDP discount factor, and horizon.

info

A list with additional information. Has to contain the gridworld dimensions as element gridworld_dim.

name

a string to identify the MDP problem.

s

a state label.

rc

a vector of length two with the row and column coordinate of a state in the gridworld matrix.

model, x

a solved gridworld MDP.

epoch

epoch for unconverged finite-horizon solutions.

what

What should be returned in the matrix. Options are: "states", "labels", "values", "actions", "absorbing", and "reachable".

actions

how to show actions. Options are: simple "character", "unicode" arrows (needs to be supported by the used font), "label" of the action, and "none" to suppress showing the action.

states

logical; show state names.

absorbing_state_action

logical; show the value and the action for absorbing states.

main

a main title for the plot. Defaults to the name of the problem.

cex

expansion factor for the action.

offset

move the state labels out of the way (in fractions of a character width).

lines

logical; draw lines to separate states.

...

further arguments are passed on to igraph::plot.igraph().

hide_unreachable_states

logical; do not show unreachable states.

remove.loops

logical; do not show transitions from a state back to itself.

vertex.color, vertex.shape, vertex.size, vertex.label, edge.arrow.size

see igraph::igraph.plotting for details. Set vertex.label = NULL to show the state labels on the graph.

margin

a single number specifying the margin of the plot. Can be used if the graph does not fit inside the plotting area.

method

a MDP solution method for solve_MDP().

n

number of iterations to animate.

zlim

limits for visualizing the state value.

Details

Gridworlds are implemented with state names s(row,col), where row and col are locations in the matrix representing the gridworld. The actions are "up", "right", "down", and "left".

gridworld_init() initializes a new gridworld creating a matrix of states with the given dimensions. Other action names can be specified, but they must have the same effects in the same order as above. Unreachable states (walls) and absorbing state can be defined. This information can be used to build a custom gridworld MDP.

Several helper functions are provided to use states, look at the state layout, and plot policies on the gridworld.

gridworld_maze_MDP() helps to easily define maze-like gridworld MDPs. By default, the goal state is absorbing, but with restart = TRUE, the agent restarts the problem at the start state every time it reaches the goal and receives the reward. Note that this implies that the goal state itself becomes unreachable.

gridworld_animate() applies algorithms from solve_MDP() iteration by iteration and visualized the state utilities. This helps to understand how the algorithms work.

See Also

Other gridworld: Cliff_walking, Maze, Windy_gridworld

Other MDP: MDP(), MDP2POMDP, MDP_policy_functions, accessors, actions(), add_policy(), reachable_and_absorbing, regret(), simulate_MDP(), solve_MDP(), transition_graph(), value_function()

Examples

# Defines states, actions and a transition model for a standard gridworld
gw <- gridworld_init(dim = c(7,7),
                unreachable_states = c("s(2,2)", "s(7,3)", "s(3,6)"),
                absorbing_states = "s(4,4)",
                labels = list("s(4,4)" = "Black Hole")
                )

gw$states
gw$actions
gw$info

# display the state labels in the gridworld
gridworld_matrix(gw)
gridworld_matrix(gw, what = "label")
gridworld_matrix(gw, what = "reachable")
gridworld_matrix(gw, what = "absorbing")

# a transition function for regular moves in the gridworld is provided
gw$transition_prob("right", "s(1,1)", "s(1,2)")
gw$transition_prob("right", "s(2,1)", "s(2,2)")  ### we cannot move into an unreachable state
gw$transition_prob("right", "s(2,1)", "s(2,1)")  ### but the agent stays in place

# convert between state names and row/column indices
gridworld_s2rc("s(1,1)")
gridworld_rc2s(c(1,1))

# The information in gw can be used to build a custom MDP.

# We modify the standard transition function so there is a 50% chance that
# you will get sucked into the black hole from the adjacent squares.
trans_black_hole <- function(action = NA, start.state = NA, end.state = NA) {
  # ignore the action next to the black hole
  if (start.state %in% c("s(3,3)", "s(3,4)", "s(3,5)", "s(4,3)", "s(4,5)",
                         "s(5,3)", "s(5,4)", "s(5,5)")) {
        if(end.state == "s(4,4)")
            return(.5)
        else
            return(gw$transition_prob(action, start.state, end.state) * .5)
  }

  # use the standard gridworld movement
  gw$transition_prob(action, start.state, end.state)
}

black_hole <- MDP(states = gw$states,
  actions = gw$actions,
  transition_prob = trans_black_hole,
  reward = rbind(R_(value = +1), R_(end.state = "s(4,4)", value = -100)),
  info = gw$info,
  name = "Black hole"
  )

black_hole

gridworld_plot_transition_graph(black_hole)

# solve the problem
sol <- solve_MDP(black_hole)
gridworld_matrix(sol, what = "values")
gridworld_plot_policy(sol)
# the optimal policy is to fly around, but avoid the black hole.

# Build a Maze: The Dyna Maze from Chapter 8 in the RL book

Dyna_maze <- gridworld_maze_MDP(
                dim = c(6,9),
                start = "s(3,1)",
                goal = "s(1,9)",
                walls = c("s(2,3)", "s(3,3)", "s(4,3)",
                          "s(5,6)",
                          "s(1,8)", "s(2,8)", "s(3,8)"),
                restart = TRUE,
                discount = 0.95,
                name = "Dyna Maze",
                )
Dyna_maze

gridworld_matrix(Dyna_maze)
gridworld_matrix(Dyna_maze, what = "labels")

gridworld_plot_transition_graph(Dyna_maze)
# Note that the problems resets if the goal state would be reached.

sol <- solve_MDP(Dyna_maze)

gridworld_matrix(sol, what = "values")
gridworld_matrix(sol, what = "actions")
gridworld_plot_policy(sol)
gridworld_plot_policy(sol, actions = "label", cex = 1, states = FALSE)

# visualize the first 3 iterations of value iteration
gridworld_animate(Dyna_maze, method = "value", n = 3)

[Package pomdp version 1.2.3 Index]