BTdecayLasso {BTdecayLasso} | R Documentation |
Bradley-Terry Model with Exponential Decayed weighted likelihood and Adaptive Lasso
Description
Bradley-Terry model is applied for paired comparison data. Teams' ability score is estimated by maximizing log-likelihood function.
To achieve a better track of current abilities, we apply an exponential decay rate to weight the log-likelihood function.
The most current matches will weight more than previous matches. Parameter "decay.rate" in most functions of this package is used
to set the amount of exponential decay rate. decay.rate should be non-negative and the appropriate range of it depends on time scale in original dataframe.
(see BTdataframe
and parameter "dataframe"'s definition of fifth column) For example,
a unit of week with a "decay.rate" 0.007 is equivalent to the unit of day with "decay.rate" 0.001. Usually, for sports matches,
if we take the unit of day, it's ranging from 0 to 0.01. The higher choice of "decay.rate", the better track of current teams' ability
with a side effect of higher variance.
If "decay.rate" is too large, for example "0.1" with a unit of day, \exp(-0.7)
= 0.50. Only half weight will be add to the likelihood for matches played
one week ago and \exp(-3.1)
= 0.05 suggests that previous matches took place one month ago will have little effect. Therefore, Only a few matches are
accounted for ability's estimation. It will lead to a very high variance and uncertainty. Since standard Bradley-Terry model
can not handle the case where there is a team who wins or loses all matches, such estimation may not provide convergent results.
Thus, if our estimation provides divergent result, an error will be returned and we suggest user to chose a smaller "decay.rate"
or adding more match results into the same modeling period.
By default, the Adaptive Lasso is implemented for variance reduction and team's grouping. Adaptive Lasso is proved to have good grouping property.
Apart from adaptive lasso, user can define own weight for different
Lasso constraint \left|\mu_{i}-\mu_{j}\right|
where \mu_{i}
is team i's ability.
Also by default, the whole Lasso path will be run. Similar to package "glmnet", user can provide their own choice of Lasso penalty "lambda" and determine whether the
whole Lasso path will be run (since such run is time-consuming). However, we suggest that if user is not familiar with the actual relationship among
lambda, the amount of penalty, the amount of shrinkage and grouping effect, a whole Lasso path should be run and selection of an
appropriate lambda is done by AIC or BIC criteria using BTdecayLassoC
(since this model is time related, cross-validation method cannot be applied). Also, users can
use BTdecayLassoF
to run with a specific Lasso penalty ranging from 0 to 1 (1 penalty means all estimators will shrink to 0).
Two sets of estimated abilities will be given, the biased Lasso estimation and the HYBRID Lasso's estimation. HYBRID Lasso estimation solves the restricted Maximum Likelihood optimization based on the group determined by Lasso's estimation (Different team's ability will converges to the same value if Lasso penalty is added and these teams' ability is setting to be equal as a restriction).
In addition, summary() using S3 method can be applied to view the outputs.
Usage
BTdecayLasso(
dataframe,
ability,
lambda = NULL,
weight = NULL,
path = TRUE,
decay.rate = 0,
fixed = 1,
thersh = 1e-05,
max = 100,
iter = 100
)
Arguments
dataframe |
Generated using |
ability |
A column vector of teams ability, the last row is the home parameter.
The row number is consistent with the team's index shown in dataframe. It can be generated using |
lambda |
The amount of Lasso penalty induced. The input should be a positive scalar or a sequence. |
weight |
Weight for Lasso penalty on different abilities. |
path |
whether the whole Lasso path will be run (plot.BTdecayLasso is enabled only if path = TRUE) |
decay.rate |
A non-negative exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more importance to most recent matches and the estimated parameters reflect more on recent behaviour. |
fixed |
A teams index whose ability will be fixed as 0. The worstTeam's index
can be generated using |
thersh |
Threshold for convergence used for Augmented Lagrangian Method. |
max |
Maximum weight for |
iter |
Number of iterations used in L-BFGS-B algorithm. |
Details
According to BTdecay
, the objective likelihood function to be optimized is,
\sum_{k=1}^{n}\sum_{i<j}\exp(-\alpha t_{k})\cdot(y_{ij}(\tau h_{ij}^{t_{k}}+\mu_{i}-\mu_{j})-\log(1+\exp(\tau h_{ij}^{t_{k}}+\mu_{i}-\mu_{j})))
The Lasso constraint is given as,
\sum_{i<j}w_{ij}\left|\mu_{i}-\mu_{j}\right|\leq s
where w_{ij}
are predefined weight. For Adaptive Lasso, \left|w_{ij}=1/(\mu_{i}^{MLE}-\mu_{j}^{MLE})\right|
.
Maximize this constraint objective function is equivalent to minimizing the following equation,
-l(\mu,\tau)+\lambda\sum_{i<j}w_{ij}|\mu_{i}-\mu_{j}|
Where -l(\mu,\tau)
is taking negative value of objective function above. Increase "lambda" will decrease "s", their relationship is
monotone. Here, we define "penalty" as 1-s/\max(s)
. Thus, "lambda" and "penalty" has a positive correlation.
Value
ability |
Estimated ability scores with user given lambda |
likelihood |
Negative likelihood of objective function with user given lambda |
df |
Degree of freedom with user given lambda(number of distinct |
penalty |
|
Lambda |
User given lambda |
ability.path |
if path = TRUE, estimated ability scores on whole Lasso path |
likelihood.path |
if path = TRUE, negative likelihood of objective function on whole Lasso path |
df.path |
if path = TRUE, degree of freedom on whole Lasso path(number of distinct |
penalty.path |
if path = TRUE, |
Lambda.path |
if path = TRUE, Whole Lasso path |
path |
Whether whole Lasso path will be run |
HYBRID.ability.path |
If path = TRUE, the whole path of evolving of HYBRID ability |
HYBRID.likelihood.path |
if path = TRUE, the whole path of HYBRID likelihood |
References
Masarotto, G. and Varin, C.(2012) The Ranking Lasso and its Application to Sport Tournaments. *The Annals of Applied Statistics* **6** 1949–1970.
Zou, H. (2006) The adaptive lasso and its oracle properties. *J.Amer.Statist.Assoc* **101** 1418–1429.
See Also
BTdataframe
for dataframe initialization,
plot.swlasso
, plot.wlasso
are used for Lasso path plot if path = TRUE in this function's run
Examples
##Initializing Dataframe
x <- BTdataframe(NFL2010)
##The following code runs the main results
##Usually a single lambda's run will take 1-20 s
##The whole Adaptive Lasso run will take 5-20 min
##BTdecayLasso run with exponential decay rate 0.005 and
##lambda 0.1, use path = TRUE if you want to run whole LASSO path
y1 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, path = FALSE,
decay.rate = 0.005, fixed = x$worstTeam)
summary(y1)
##Defining equal weight
##Note that comparing to Adaptive weight, the user defined weight may not be
##efficient in groupiing. Therefore, to run the whole Lasso path
##(evolving of distinct ability scores), it may take a much longer time.
##We recommend the user to apply the default setting,
##where Adaptive Lasso will be run.
n <- nrow(x$ability) - 1
w2 <- matrix(1, nrow = n, ncol = n)
w2[lower.tri(w2, diag = TRUE)] <- 0
##BTdecayLasso run with exponential decay rate 0.005 and with a specific lambda 0.1
y2 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, weight = w2,
path = FALSE, decay.rate = 0.005, fixed = x$worstTeam)
summary(y2)