sim {runexp} | R Documentation |
Softball run expectancy using multinomial random trial simulation
Description
Utilizes a multinomial simulation to simulate a softball game scenario with a specified number of innings (inn) per game over a specified number of games (reps). Calculations depend on specified player probabilities (see details) and a nine-player lineup. Optionally incorporates attempted steals and "fast" players who are able to stretch bases. Optionally utilizes SNOW parallelization.
Usage
sim(
lineup,
stats,
inn = 7,
reps = 100,
graphic = FALSE,
waitTime = 2,
cores = NULL
)
Arguments
lineup |
either character vector of player names or numeric vector of player numbers. Must be of length 1 or 9. If lineup is of length 1, the single player will be "duplicated" nine times to form a complete lineup. |
stats |
data frame of player statistics (see details) |
inn |
number of innings per rep (the default of 7 represents a typical softball game) |
reps |
number of times to repeat the softball game simulation. Can be thought of as number of games. |
graphic |
logical indicating on whether to plot the
player base movement. Requires |
waitTime |
the amount of time to pause before making
next plot for play. Only relevant when |
cores |
number of cores to utilize in parallel. Defaults to one less than maximum available nodes. |
Details
In each simulation, we determine each batter's hit results through a multinomial random trial where the probability of walk (W), single (S), double (D), triple (TR), home run (HR), and batter out (O) are assigned per input player statistics. We incorporate the impact of "fast" players through the following assumptions:
If a fast player is on first and the batter hits a single, the fast player will stretch to third base (leaving the batter on first).
If a fast player is on second and the batter hits a single, the fast player will stretch home (leaving the batter on first and a single run scored).
If a fast player is on first and the batter hits a double, the fast player will stretch home (leaving the batter on second base and a single run scored).
A typical player (not fast) who successfully steals a base will become a fast player for the remainder of that inning (meaning that a player who successfully steals second base will stretch home on a single).
Aside from these fast player assumptions, runners advance bases as expected (a single advances each runner one base, a double advances each runner two bases, etc.).
Following softball norms, we only entertain steals of second base. Steals are considered in cases when there is a runner on first and no runner on second. In these situations, we use a bernoulli coin flip (based on the runner's SBA probability) to determine whether the runner on first will attempt a steal. In practice, these decisions are commonly left up to coaches. If it is decided that the player will attempt a steal, a second bernoulli coin flip (based on the runner's SB probability) determines whether the steal was successful or whether the player was caught stealing.
The stats
input must be a data frame containing player probabilities. It must
contain columns "O", "S", "D", "TR", "HR", and "W" whose entries are probabilities summing
to one, corresponding to the probability of a player's at-bat resulting in each outcome.
The data frame must contain either a "NAME" or "NUMBER" column to identify players (these
must correspond to the lineup
). Extra rows for players not in the lineup will be ignored.
This data frame may be generated from player statistics using prob_calc
.
The stats
data frame may optionally include an "SBA" (stolen base attempt) column
that provides the probability a given player will attempt a steal (provided they are on first
base with no runner on second). If "SBA" is specified, the data frame must also include
a "SB" (stolen base) column that provides the probability of a given player successfully
stealing a base (conditional on them attempting a steal). If these probabilities are not
specified, calculations will not involve any steals.
The stats
data frame may also include a logical "FAST" column that indicates
whether a player is fast. If this column is not specified, the "FAST" designation
will be assigned based on each player's "SBA" probability. Players who are more
likely to attempt steals are likely the fast players.
As a default, simulations will be processed in parallel over all but one of the
maximum available cores. Parallelization is recommended to reduce computation time.
Interactive plotting (graphic = TRUE
) requires no parallelization and will
override specified cores with cores = 1
.
Value
A list of the S3 class "sim
" with the following elements:
-
lineup
: copy of input lineup -
stats
: copy of input stats -
inn
: copy of input innings -
score
: a vector containing the scores per each rep (game) -
score_avg_game
: the average expected score per rep (game). That is,mean(score)
. -
score_avg_inn
: the average expected score per rep (game) per inning. That is,mean(score)/inn
. Ifinn = 1
, thenscore_avg_game = score_avg_inn
. -
time
: computation time in seconds
Examples
# Short simulation (designed to run in less than 5 seconds)
sim1 <- sim("B", wku_probs, inn = 1, reps = 100, cores = 1)
# Simulation with interactive graphic
lineup <- wku_probs$name[1:9]
sim2 <- sim(lineup, wku_probs, inn = 7, reps = 1, graphic = TRUE)
# Simulation for entire game (recommended to increase cores)
sim3 <- sim(lineup, wku_probs, cores = 1)
boxplot(sim3$score)
points(1, sim3$score_avg_game)
# GAME SITUATION COMPARISON OF CHAIN AND SIMULATOR
# Select lineup made up of the nine "starters"
lineup <- sample(wku_probs$name[1:9], 9)
# Average chain across lead-off batters
chain_avg <- mean(chain(lineup, wku_probs, cycle = TRUE)$score)
# Simulate full 7 inning game (recommended to increase cores)
sim_score <- sim(lineup, wku_probs, inn = 7, reps = 50000, cores = 1)
# Split into bins in order to plot averages
sim_grouped <- split(sim_score$score, rep(1:100, times = 50000 / 100))
boxplot(sapply(sim_grouped, mean), ylab = 'Expected Score for Game')
points(1, sim_score$score_avg_game, pch = 16, cex = 2, col = 2)
points(1, chain_avg * 7, pch = 18, cex = 2, col = 3)