sim {runexp}R Documentation

Softball run expectancy using multinomial random trial simulation

Description

Utilizes a multinomial simulation to simulate a softball game scenario with a specified number of innings (inn) per game over a specified number of games (reps). Calculations depend on specified player probabilities (see details) and a nine-player lineup. Optionally incorporates attempted steals and "fast" players who are able to stretch bases. Optionally utilizes SNOW parallelization.

Usage

sim(
  lineup,
  stats,
  inn = 7,
  reps = 100,
  graphic = FALSE,
  waitTime = 2,
  cores = NULL
)

Arguments

lineup

either character vector of player names or numeric vector of player numbers. Must be of length 1 or 9. If lineup is of length 1, the single player will be "duplicated" nine times to form a complete lineup.

stats

data frame of player statistics (see details)

inn

number of innings per rep (the default of 7 represents a typical softball game)

reps

number of times to repeat the softball game simulation. Can be thought of as number of games.

graphic

logical indicating on whether to plot the player base movement. Requires reps < 4. Forces cores = 1.

waitTime

the amount of time to pause before making next plot for play. Only relevant when graphic = TRUE.

cores

number of cores to utilize in parallel. Defaults to one less than maximum available nodes.

Details

In each simulation, we determine each batter's hit results through a multinomial random trial where the probability of walk (W), single (S), double (D), triple (TR), home run (HR), and batter out (O) are assigned per input player statistics. We incorporate the impact of "fast" players through the following assumptions:

Aside from these fast player assumptions, runners advance bases as expected (a single advances each runner one base, a double advances each runner two bases, etc.).

Following softball norms, we only entertain steals of second base. Steals are considered in cases when there is a runner on first and no runner on second. In these situations, we use a bernoulli coin flip (based on the runner's SBA probability) to determine whether the runner on first will attempt a steal. In practice, these decisions are commonly left up to coaches. If it is decided that the player will attempt a steal, a second bernoulli coin flip (based on the runner's SB probability) determines whether the steal was successful or whether the player was caught stealing.

The stats input must be a data frame containing player probabilities. It must contain columns "O", "S", "D", "TR", "HR", and "W" whose entries are probabilities summing to one, corresponding to the probability of a player's at-bat resulting in each outcome. The data frame must contain either a "NAME" or "NUMBER" column to identify players (these must correspond to the lineup). Extra rows for players not in the lineup will be ignored. This data frame may be generated from player statistics using prob_calc.

The stats data frame may optionally include an "SBA" (stolen base attempt) column that provides the probability a given player will attempt a steal (provided they are on first base with no runner on second). If "SBA" is specified, the data frame must also include a "SB" (stolen base) column that provides the probability of a given player successfully stealing a base (conditional on them attempting a steal). If these probabilities are not specified, calculations will not involve any steals.

The stats data frame may also include a logical "FAST" column that indicates whether a player is fast. If this column is not specified, the "FAST" designation will be assigned based on each player's "SBA" probability. Players who are more likely to attempt steals are likely the fast players.

As a default, simulations will be processed in parallel over all but one of the maximum available cores. Parallelization is recommended to reduce computation time. Interactive plotting (graphic = TRUE) requires no parallelization and will override specified cores with cores = 1.

Value

A list of the S3 class "sim" with the following elements:

Examples

# Short simulation (designed to run in less than 5 seconds)
sim1 <- sim("B", wku_probs, inn = 1, reps = 100, cores = 1)

# Simulation with interactive graphic
lineup <- wku_probs$name[1:9]
sim2 <- sim(lineup, wku_probs, inn = 7, reps = 1, graphic = TRUE)

# Simulation for entire game (recommended to increase cores)
sim3 <- sim(lineup, wku_probs, cores = 1)
boxplot(sim3$score)
points(1, sim3$score_avg_game)

# GAME SITUATION COMPARISON OF CHAIN AND SIMULATOR

# Select lineup made up of the nine "starters"
lineup <- sample(wku_probs$name[1:9], 9)

# Average chain across lead-off batters
chain_avg <- mean(chain(lineup, wku_probs, cycle = TRUE)$score)

# Simulate full 7 inning game (recommended to increase cores)
sim_score <- sim(lineup, wku_probs, inn = 7, reps = 50000, cores = 1)

# Split into bins in order to plot averages
sim_grouped <- split(sim_score$score, rep(1:100, times = 50000 / 100))

boxplot(sapply(sim_grouped, mean), ylab = 'Expected Score for Game')
points(1, sim_score$score_avg_game, pch = 16, cex = 2, col = 2)
points(1, chain_avg * 7, pch = 18, cex = 2, col = 3)

       

[Package runexp version 0.2.1 Index]