Ensemble_ranking_IW {adabag}R Documentation

Ensemble methods for ranking data: Item-Weighted Boosting and Bagging Algorithms

Description

The Ensemble_ranking_IW function applies the item-weighted Boosting and Bagging algorithms to ranking data (Albano et al., 2023). These algorithms utilize classification trees as base classifiers to perform item-weighted ensemble methods for rankings.

Usage

Ensemble_ranking_IW(formula, data, iw, algo = "boosting", 
  mfinal = 100, coeflearn = "Breiman", control, bin = FALSE, 
  trace= TRUE, ...)

Arguments

formula

a formula specifying the response ranking variable and predictors, similar to the lm function. The response variable must be the "Label" column of the object generated by the prep_data function.

data

An N by (K+1) data frame containing the prepared item-weighted ranking data. The column "Label" should contain the transformed ranking responses, and the remaining columns should contain the predictors. Continuous variables are allowed, while the dummy coding should be used for categorical variables. The data frame must be the output of the prep_data function.

iw

a vector or matrix representing the item weights or dissimilarities for the ranking data. For a vector, it should be a row vector of length M, where M is the number of items. For a matrix, it should be a symmetric M by M matrix representing item dissimilarities. For coherence, iw should be the same vector/matrix used in prep_data(...).

algo

the ensemble method to use. Possible values are "bagging" or "boosting". Defaults to "boosting".

mfinal

the number of trees to use for boosting or bagging. Defaults to 100 iterations.

coeflearn

the coefficient learning method to use. Possible values are "Breiman", "Freund", or "Zhu". Defaults to "Breiman".

control

an optional argument to control details of the classification tree algorithm. See rpart.control for more information.

bin

a logical value indicating whether to use the binary logarithm function for updating weights at each iteration. Defaults to FALSE. When set to TRUE, it corresponds to utilizing the AdaBoost.R.M2 algorithm as defined by Albano et al. (2023).

trace

a logical value controling the display of additional information ( the number of trees and the average weighted tau_x) during execution. Defaults to TRUE.

...

additional arguments passed to or from other methods.

Details

The Ensemble_ranking_IW function extends the Boosting and Bagging algorithms to handle item-weighted ranking data. It allows for the application of these ensemble methods to improve ranking predicting performance using classification trees as base classifiers.

Value

An object of class boosting or bagging, which is a list with the following components:

formula

the used formula.

trees

the trees grown during the iterations.

weights

a vector of weights for each tree in all iterations.

importance

a measure of the relative importance of each predictor in the ranking task, taking into account the weighted gain of the variable's contribution in each tree.

Author(s)

Alessandro Albano alessandro.albano@unipa.it, Mariangela Sciandra mariangela.sciandra@unipa.it, and Antonella Plaia antonella.plaia@unipa.it

References

Albano, A., Sciandra, M., and Plaia, A. (2023): "A weighted distance-based approach with boosted decision trees for label ranking." Expert Systems with Applications.

Alfaro, E., Gamez, M., and Garcia, N. (2013): "adabag: An R Package for Classification with Boosting and Bagging." Journal of Statistical Software, Vol. 54, 2, pp. 1–35.

Breiman, L. (1998): "Arcing classifiers." The Annals of Statistics, Vol. 26, 3, pp. 801–849.

D'Ambrosio, A.[aut, cre], Amodio, S. [ctb], Mazzeo, G. [ctb], Albano, A. [ctb], Plaia, A. [ctb] (2023). ConsRank: Compute the Median Ranking(s) According to the Kemeny's Axiomatic Approach. R package version 2.1.3, https://cran.r-project.org/package=ConsRank.

Freund, Y., and Schapire, R.E. (1996): "Experiments with a new boosting algorithm." In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156, Morgan Kaufmann.

Plaia, A., Buscemi, S., Furnkranz, J., and Mencıa, E.L. (2021): "Comparing boosting and bagging for decision trees of rankings." Journal of Classification, pages 1–22.

Zhu, J., Zou, H., Rosset, S., and Hastie, T. (2009): "Multi-class AdaBoost." Statistics and Its Interface, 2, pp. 349–360.

Examples

## Not run: 
  # Load simulated ranking data
  data(simulatedRankingData)
  x <- simulatedRankingData$x
  y <- simulatedRankingData$y

  # Prepare the data with item weights
  dati <- prep_data(y, x, iw = c(2, 5, 5, 2))

  # Divide the data into training and test sets
  set.seed(12345)
  samp <- sample(nrow(dati))
  l <- length(dati[, 1])
  sub <- sample(1:l, 2 * l / 3)
  data_sub1 <- dati[sub, ]
  data_test1 <- dati[-sub, ]

  # Apply ensemble ranking with AdaBoost.M1
  boosting_1 <- Ensemble_ranking_IW(
    Label ~ .,
    data = data_sub1,
    iw = c(2, 5, 5, 2),
    mfinal = 3,
    coeflearn = "Breiman",
    control = rpart.control(maxdepth = 4, cp = -1),
    algo = "boosting",
    bin = FALSE
  )

  # Evaluate the performance
  test_boosting1 <- errorevol_ranking_vector_IW(boosting_1, 
    newdata = data_test1, iw=c(2,5,5,2), squared = FALSE)
  test_boosting1.1 <- errorevol_ranking_vector_IW(boosting_1, 
    newdata = data_sub1, iw=c(2,5,5,2), squared = FALSE)

  # Plot the error evolution
  plot.errorevol(test_boosting1, test_boosting1.1)
  
## End(Not run)

[Package adabag version 5.0 Index]