chain {runexp}R Documentation

Softball run expectancy using discrete Markov chains

Description

Uses discrete Markov chains to calculate softball run expectancy for a single (half) inning. Calculations depend on specified player probabilities (see details) and a nine-player lineup. Optionally incorporates attempted steals and "fast" players who are able to strech bases.

Usage

chain(lineup, stats, cycle = FALSE, max_at_bats = 18)

Arguments

lineup

either character vector of player names or numeric vector of player numbers. Must be of length 1 or 9. If lineup is of length 1, the single player will be "copied" nine times to form a complete lineup.

stats

data frame of player statistics (see details)

cycle

logical indicating whether to calculate run expectancy for each of the 9 possible lead-off batters. Preserves the order of the lineup. As a default, only the first player in lineup is used as lead-off. Cycling is not relevant when the lineup is made up of a single player.

max_at_bats

maximum number of at bats (corresponding to matrix powers) used in calculation. Must be sufficiently large to achieve convergence. Convergence may be checked using plot with type = 1.

Details

The typical state space for softball involves 25 states defined by the base situation (runners on base) and number of outs. The standard base situations are: (1) bases empty, (2) runner on first, (3) runner on second, (4) runner on third, (5) runners on first and second, (6) runners on second and third, (7) runners on first and third, and (8) bases loaded. These 8 states are crossed with each of three out states (0 outs, 1 out, or 2 outs) to form 24 states. The final 25th state is the 3 outs that marks the end of an inning.

We expand these 25 states to incorporate "fast" players. We make the following assumptions concerning fast players:

Based on these assumptions, we add base situations that designate runners on first and second base as either typical runners (R) or fast runners (F). The entirety of these base situations can be viewed using plot.chain with fast = TRUE. Aside from these fast player assumptions, runners advance bases as expected (a single advances each runner one base, a double advances each runner two bases, etc.).

Each at bat results in a change to the base situation and/or the number of outs. The outcomes of an at-bat are limited to:

The transitions resulting from these outcomes are stored in "transition matrices." We utilize separate transition matrices for typical batters and fast batters (in order to keep fast runners designated separately). We additionally incorporate stolen bases. Steals are handled separately than the six at-bat outcomes because they do not result in changes to the batter. Following softball norms, we only entertain steals of second base. Steals are considered in cases when there is a runner on first and no runner on second. In this situation, steal possibilities are limited to:

Steal possibilities are implemented in separate transition matrices. All transition matrices are stored as internal RData files.

The stats input must be a data frame containing player probabilities. It must contain columns "O", "S", "D", "TR", "HR", and "W" whose entries are probabilities summing to one, corresponding to the probability of a player's at-bat resulting in each outcome. The data frame must contain either a "NAME" or "NUMBER" column to identify players (these must correspond to the lineup). Extra rows for players not in the lineup will be ignored. This data frame may be generated from player statistics using prob_calc.

The stats data frame may optionally include an "SBA" (stolen base attempt) column that provides the probability a given player will attempt a steal (provided they are on first base with no runner on second). If "SBA" is specified, the data frame must also include a "SB" (stolen base) column that provides the probability of a given player successfully stealing a base (conditional on them attempting a steal). If these probabilities are not specified, calculations will not involve any steals.

The stats data frame may also include a logical "FAST" column that indicates whether a player is fast. If this column is not specified, the "FAST" designation will be assigned based on each player's "SBA" probability. Generally, players who are more likely to attempt steals are the fast players.

The cycle parameter is a useful tool for evaluating an entire lineup. Through the course of a game, any of the nine players may lead-off an inning. A weighted or un-weighted average of these nine expected scores provides a more holistic representation of the lineup than the expected score based on a single lead-off.

Value

A list of the S3 class "chain" with the following elements:

References

B. Bukiet, E. R. Harold, and J. L. Palacios, “A Markov Chain Approach to Baseball,” Operations Research 45, 14–23 (1997).

Examples

# Expected score for single batter (termed "offensive potential")
chain1 <- chain("B", wku_probs)
plot(chain1)

# Expected score without cycling
lineup <- wku_probs$name[1:9]
chain2 <- chain(lineup, wku_probs)
plot(chain2)

# Expected score with cycling
chain3 <- chain(lineup, wku_probs, cycle = TRUE)
plot(chain3, type = 1:3)


# GAME SITUATION COMPARISON OF CHAIN AND SIMULATOR

# Select lineup made up of the nine "starters"
lineup <- sample(wku_probs$name[1:9], 9)

# Average chain across lead-off batters
chain_avg <- mean(chain(lineup, wku_probs, cycle = TRUE)$score)

# Simulate full 7 inning game (recommended to increase cores)
sim_score <- sim(lineup, wku_probs, inn = 7, reps = 50000, cores = 1)

# Split into bins in order to plot averages
sim_grouped <- split(sim_score$score, rep(1:100, times = 50000 / 100))

# Plot results
boxplot(sapply(sim_grouped, mean), ylab = 'Expected Score for Game')
points(1, sim_score$score_avg_game, pch = 16, cex = 2, col = 2)
points(1, chain_avg * 7, pch = 18, cex = 2, col = 3)

       

[Package runexp version 0.2.1 Index]