stan_foot {footBayes}R Documentation

Fit football models with Stan

Description

Stan football modelling for the most famous models: double Poisson, bivariate Poisson, Skellam, student t, diagonal-inflated bivariate Poisson and zero-inflated Skellam.

Usage

stan_foot(
  data,
  model,
  predict,
  ranking,
  dynamic_type,
  prior,
  prior_sd,
  ind_home = "TRUE",
  ...
)

Arguments

data

A data frame, or a matrix containing the following mandatory items: season, home team, away team, home goals, away goals.

model

The type of Stan model used to fit the data. One among the following: "double_pois", "biv_pois", "skellam", "student_t", "diag_infl_biv_pois", "zero_infl_skellam".

predict

The number of out-of-sample matches. If missing, the function returns the fit for the training set only.

ranking

Eventual numeric ranking provided for the teams in the dataset (e.g., the Coca-Cola Fifa ranking)

dynamic_type

One among "weekly" or "seasonal" for weekly dynamic parameters or seasonal dynamic parameters.

prior

The prior distribution for the team-specific abilities. Possible choices: normal, student_t, cauchy, laplace. See the rstanarm for a deep overview and read the vignette Prior Distributions for rstanarm Models

prior_sd

The prior distribution for the team-specific standard deviations. See the prior argument for more details.

ind_home

Home effect (default is TRUE).

...

Optional parameters passed to the function in the rstan package. It is possibly to specify iter, chains, cores, refresh, etc.

Details

Let (ynH,ynA)(y^{H}_{n}, y^{A}_{n}) denote the observed number of goals scored by the home and the away team in the nn-th game, respectively. A general bivariate Poisson model allowing for goals' correlation (Karlis & Ntzoufras, 2003) is the following:

YnH,YnAλ1n,λ2n,λ3nBivPoisson(λ1n,λ2n,λ3n) Y^H_n, Y^A_n| \lambda_{1n}, \lambda_{2n}, \lambda_{3n} \sim \mathsf{BivPoisson}(\lambda_{1n}, \lambda_{2n}, \lambda_{3n})

log(λ1n)=μ+atthn+defan\log(\lambda_{1n}) = \mu+att_{h_n} + def_{a_n}

log(λ2n)=attan+defhn\log(\lambda_{2n}) = att_{a_n} + def_{h_n}

log(λ3n)=β0,\log(\lambda_{3n}) =\beta_0,

where the case λ3n=0\lambda_{3n}=0 reduces to the double Poisson model (Baio & Blangiardo, 2010). λ1n,λ2n\lambda_{1n}, \lambda_{2n} represent the scoring rates for the home and the away team, respectively, where: μ\mu is the home effect; the parameters attTatt_T and defTdef_T represent the attack and the defence abilities, respectively, for each team TT, T=1,,NTT=1,\ldots,N_T; the nested indexes hn,an=1,,NTh_{n}, a_{n}=1,\ldots,N_T denote the home and the away team playing in the nn-th game, respectively. Attack/defence parameters are imposed a sum-to-zero constraint to achieve identifiability and assigned some weakly-informative prior distributions:

attTN(μatt,σatt)att_T \sim \mathcal{N}(\mu_{att}, \sigma_{att})

defTN(μdef,σdef),def_T \sim \mathcal{N}(\mu_{def}, \sigma_{def}),

with hyperparameters μatt,σatt,μdef,σdef\mu_{att}, \sigma_{att}, \mu_{def}, \sigma_{def}.

Instead of using the marginal number of goals, another alternative is to modelling directly the score difference (ynHynA)(y^{H}_{n}- y^{A}_{n}). We can use the Poisson-difference distribution (or Skellam distribution) to model goal difference in the nn-th match (Karlis & Ntzoufras, 2009):

ynHynAλ1n,λ2nPD(λ1n,λ2n),y^{H}_{n}- y^{A}_{n}| \lambda_{1n}, \lambda_{2n} \sim PD(\lambda_{1n}, \lambda_{2n}),

and the scoring rates λ1n,λ2n\lambda_{1n}, \lambda_{2n} are unchanged with respect to the bivariate/double Poisson model. If we want to use a continue distribution, we can use a student t distribution with 7 degrees of freedom (Gelman, 2014):

ynHynAt(7,abhnaba(n),σy)y^{H}_{n}- y^{A}_{n} \sim t(7, ab_{h_{n}}-ab_{a(n)}, \sigma_y)

abtN(μ+b×prior_scoret,sigmaab),ab_t \sim \mathcal{N}(\mu + b \times {prior\_score}_t, sigma_{ab}),

where abtab_t is the overall ability for the tt-th team, whereas prior_scoretprior\_score_t is a prior measure of team's strength (for instance a ranking).

These model rely on the assumption of static parameters. However, we could assume dynamics in the attach/defence abilities (Owen, 2011; Egidi et al., 2018) in terms of weeks or seasons through the argument dynamic_type. In such a framework, for a given number of times 1,,T1, \ldots, \mathcal{T}, the models above would be unchanged, but the priors for the abilities parameters at each time τ,τ=2,,T,\tau, \tau=2,\ldots, \mathcal{T}, would be:

attT,τN(attT,τ1,σatt)att_{T, \tau} \sim \mathcal{N}({att}_{T, \tau-1}, \sigma_{att})

defT,τN(defT,τ1,σdef),def_{T, \tau} \sim \mathcal{N}({def}_{T, \tau-1}, \sigma_{def}),

whereas for τ=1\tau=1 we have:

attT,1N(μatt,σatt)att_{T, 1} \sim \mathcal{N}(\mu_{att}, \sigma_{att})

defT,1N(μdef,σdef).def_{T, 1} \sim \mathcal{N}(\mu_{def}, \sigma_{def}).

Of course, the identifiability constraint must be imposed for each time τ\tau.

The current version of the package allows for the fit of a diagonal-inflated bivariate Poisson and a zero-inflated Skellam model in the spirit of (Karlis & Ntzoufras, 2003) to better capture draw occurrences. See the vignette for further details.

Value

An object of S4 class, stanfit-class.

Author(s)

Leonardo Egidi legidi@units.it, Vasilis Palaskas vasilis.palaskas94@gmail.com.

References

Baio, G. and Blangiardo, M. (2010). Bayesian hierarchical model for the prediction of football results. Journal of Applied Statistics 37(2), 253-264.

Egidi, L., Pauli, F., and Torelli, N. (2018). Combining historical data and bookmakers' odds in modelling football scores. Statistical Modelling, 18(5-6), 436-459.

Gelman, A. (2014). Stan goes to the World Cup. From "Statistical Modeling, Causal Inference, and Social Science" blog.

Karlis, D. and Ntzoufras, I. (2003). Analysis of sports data by using bivariate poisson models. Journal of the Royal Statistical Society: Series D (The Statistician) 52(3), 381-393.

Karlis, D. and Ntzoufras,I. (2009). Bayesian modelling of football outcomes: Using the Skellam's distribution for the goal difference. IMA Journal of Management Mathematics 20(2), 133-145.

Owen, A. (2011). Dynamic Bayesian forecasting models of football match outcomes with estimation of the evolution variance parameter. IMA Journal of Management Mathematics, 22(2), 99-113.

Examples

## Not run: 
require(tidyverse)
require(dplyr)

### Use Italian Serie A from 2000 to 2002

data("italy")
italy <- as_tibble(italy)
italy_2000_2002<- italy %>%
 dplyr::select(Season, home, visitor, hgoal,vgoal) %>%
 dplyr::filter(Season=="2000" |  Season=="2001"| Season=="2002")


### Fit Stan models
## no dynamics, no predictions

fit1 <- stan_foot(data = italy_2000_2002,
                model="double_pois") # double poisson
print(fit1, pars =c("home", "sigma_att",
                    "sigma_def"))

fit2 <- stan_foot(data = italy_2000_2002,
                model="biv_pois")    # bivariate poisson
print(fit2, pars =c("home", "rho",
                    "sigma_att", "sigma_def"))

fit3 <- stan_foot(data = italy_2000_2002,
                model="skellam")     # skellam
print(fit3, pars =c("home", "sigma_att",
                    "sigma_def"))

fit4 <- stan_foot(data = italy_2000_2002,
                model="student_t")   # student_t
print(fit4, pars =c("home", "beta"))

## seasonal dynamics, no prediction

fit5 <- stan_foot(data = italy_2000_2002,
                model="double_pois",
                dynamic_type ="seasonal") # double poisson
print(fit5, pars =c("home", "Sigma_att",
                    "Sigma_def"))

## seasonal dynamics, prediction for the last season

fit6 <- stan_foot(data = italy_2000_2002,
                model="double_pois",
                dynamic_type ="seasonal",
                predict = 306) # double poisson
print(fit6, pars =c("home", "Sigma_att",
                    "Sigma_def"))

## other priors' options

fit_p <- stan_foot(data = italy_2000_2002,
                   model="double_pois",
                   priors = student_t (4, 0, NULL),
                   prior_sd = laplace(0,1)) # double poisson with
                                            # student_t priors for teams abilities
                                            # and laplace prior for the hyper sds
print(fit_p,  pars = c("home", "sigma_att",
                    "sigma_def"))

## End(Not run)

[Package footBayes version 0.2.0 Index]