welofit {welo} | R Documentation |
Calculates the WElo and Elo rates
Description
Calculates the WElo and Elo rates according to Angelini et al. (2022). In particular, the Elo updating system
defines the rates (for player ) as:
where is the Elo rate at time
,
is the outcome (1 or 0) for player
in the match at time
,
is a scale factor, and
is the probability of winning for match at time
, calculated using
tennis_prob
.
The scale factor determines how much the rates change over time. By default, according to Kovalchik (2016), it is defined as
where is the number of matches disputed by player
up to time
. Alternately,
can be multiplied by 1.1 if
the match at time
is a Grand Slam match or is played on a given surface. Finally, it can be fixed to a constant value.
The WElo rating system is defined as:
where denotes the WElo rate for player
,
the probability of winning using
tennis_prob
and
the WElo rates, and represents a function whose values depend on the games (by default) or sets won in the previous match.
In particular, when parameter 'W' is set to "GAMES",
is defined as:
where and
represent the number of games won by player
and player
in match
, respectively.
When parameter 'W' is set to "SET",
is:
where and
represent the number of sets won by player
and player
in match
, respectively.
The scale factor
is the same as the Elo model.
Usage
welofit(
x,
W = "GAMES",
SP = 1500,
K = "Kovalchik",
CI = FALSE,
alpha = 0.05,
B = 1000,
new_data = NULL
)
Arguments
x |
Data cleaned through the function |
W |
optional Weights to use for the WElo rating system. Valid choices are: "GAMES" (by default) and "SETS" |
SP |
optional Starting points for calculating the rates. 1500 by default |
K |
optional Scale factor determining how much the WElo and Elo rates change over time. Valid choices are:
"Kovalchik" (by default), "Grand_Slam", "Surface_Hard", "Surface_Grass", "Surface_Clay" and, finally, a constant value |
CI |
optional Confidence intervals for the WElo and Elo rates. Default to FALSE. If 'CI' is set to "TRUE", then the confidence intervals are calculated, according to the procedure explained by Angelini et al. (2022) |
alpha |
optional Significance level of the confidence interval. Default to 0.05 |
B |
optional Number of bootstrap samples used to calculate the confidence intervals. Default to 1000 |
new_data |
optional New data, cleaned through the function |
Value
welofit
returns an object of class 'welo', which is a list containing the following components:
results: The data.frame including a variety of variables, among which there are the estimated WElo and Elo rates, before and after the match
, for players
and
, the lower and upper confidence intervals (if CI=TRUE) for the WElo and Elo rates, labelled as '_lb' and '_ub', respectively, and the probability of winning the match for player
(labelled as 'WElo_pi_hat' and 'Elo_pi_hat', respectively, for the WElo and Elo models).
matches: The number of matches analyzed.
period: The sample period considered.
loss: The Brier score (Brier 1950) and log-loss (used by Kovalchik (2016), among others) averages, calculated considering the distance with respect to the outcome of the match.
highest_welo: The player with the highest WElo rate and the relative date.
highest_elo: The player with the highest Elo rate and the relative date.
dataset: The dataset used for the estimation of the WElo and Elo rates.
References
Angelini G, Candila V, De Angelis L (2022).
“Weighted Elo rating for tennis match predictions.”
European Journal of Operational Research, 297(1), 120–132.
Brier GW (1950).
“Verification of forecasts expressed in terms of probability.”
Monthly weather review, 78(1), 1–3.
Kovalchik SA (2016).
“Searching for the GOAT of tennis win prediction.”
Journal of Quantitative Analysis in Sports, 12(3), 127–138.
Examples
data(atp_2019)
db_clean<-clean(atp_2019)
res<-welofit(db_clean)
# append new data
db_clean_1<-db_clean[1:500,]
db_clean_2<-db_clean[501:1200,]
res_1<-welofit(db_clean_1)
res_2<-welofit(res_1,new_data=db_clean_2)