| mlr_learners_regr.xgboost {mlr3learners} | R Documentation | 
Extreme Gradient Boosting Regression Learner
Description
eXtreme Gradient Boosting regression.
Calls xgboost::xgb.train() from package xgboost.
To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.
Note that using the watchlist parameter directly will lead to problems when wrapping this mlr3::Learner in a
mlr3pipelines GraphLearner as the preprocessing steps will not be applied to the data in the watchlist.
See the section Early Stopping and Validation on how to do this.
Dictionary
This mlr3::Learner can be instantiated via the dictionary mlr3::mlr_learners or with the associated sugar function mlr3::lrn():
mlr_learners$get("regr.xgboost")
lrn("regr.xgboost")
Meta Information
- Task type: “regr” 
- Predict Types: “response” 
- Feature Types: “logical”, “integer”, “numeric” 
- Required Packages: mlr3, mlr3learners, xgboost 
Parameters
| Id | Type | Default | Levels | Range | 
| alpha | numeric | 0 | [0, \infty) | |
| approxcontrib | logical | FALSE | TRUE, FALSE | - | 
| base_score | numeric | 0.5 | (-\infty, \infty) | |
| booster | character | gbtree | gbtree, gblinear, dart | - | 
| callbacks | untyped | list() | - | |
| colsample_bylevel | numeric | 1 | [0, 1] | |
| colsample_bynode | numeric | 1 | [0, 1] | |
| colsample_bytree | numeric | 1 | [0, 1] | |
| device | untyped | "cpu" | - | |
| disable_default_eval_metric | logical | FALSE | TRUE, FALSE | - | 
| early_stopping_rounds | integer | NULL | [1, \infty) | |
| eta | numeric | 0.3 | [0, 1] | |
| eval_metric | untyped | "rmse" | - | |
| feature_selector | character | cyclic | cyclic, shuffle, random, greedy, thrifty | - | 
| feval | untyped | NULL | - | |
| gamma | numeric | 0 | [0, \infty) | |
| grow_policy | character | depthwise | depthwise, lossguide | - | 
| interaction_constraints | untyped | - | - | |
| iterationrange | untyped | - | - | |
| lambda | numeric | 1 | [0, \infty) | |
| lambda_bias | numeric | 0 | [0, \infty) | |
| max_bin | integer | 256 | [2, \infty) | |
| max_delta_step | numeric | 0 | [0, \infty) | |
| max_depth | integer | 6 | [0, \infty) | |
| max_leaves | integer | 0 | [0, \infty) | |
| maximize | logical | NULL | TRUE, FALSE | - | 
| min_child_weight | numeric | 1 | [0, \infty) | |
| missing | numeric | NA | (-\infty, \infty) | |
| monotone_constraints | untyped | 0 | - | |
| normalize_type | character | tree | tree, forest | - | 
| nrounds | integer | - | [1, \infty) | |
| nthread | integer | 1 | [1, \infty) | |
| ntreelimit | integer | NULL | [1, \infty) | |
| num_parallel_tree | integer | 1 | [1, \infty) | |
| objective | untyped | "reg:squarederror" | - | |
| one_drop | logical | FALSE | TRUE, FALSE | - | 
| outputmargin | logical | FALSE | TRUE, FALSE | - | 
| predcontrib | logical | FALSE | TRUE, FALSE | - | 
| predinteraction | logical | FALSE | TRUE, FALSE | - | 
| predleaf | logical | FALSE | TRUE, FALSE | - | 
| print_every_n | integer | 1 | [1, \infty) | |
| process_type | character | default | default, update | - | 
| rate_drop | numeric | 0 | [0, 1] | |
| refresh_leaf | logical | TRUE | TRUE, FALSE | - | 
| reshape | logical | FALSE | TRUE, FALSE | - | 
| sampling_method | character | uniform | uniform, gradient_based | - | 
| sample_type | character | uniform | uniform, weighted | - | 
| save_name | untyped | NULL | - | |
| save_period | integer | NULL | [0, \infty) | |
| scale_pos_weight | numeric | 1 | (-\infty, \infty) | |
| seed_per_iteration | logical | FALSE | TRUE, FALSE | - | 
| skip_drop | numeric | 0 | [0, 1] | |
| strict_shape | logical | FALSE | TRUE, FALSE | - | 
| subsample | numeric | 1 | [0, 1] | |
| top_k | integer | 0 | [0, \infty) | |
| training | logical | FALSE | TRUE, FALSE | - | 
| tree_method | character | auto | auto, exact, approx, hist, gpu_hist | - | 
| tweedie_variance_power | numeric | 1.5 | [1, 2] | |
| updater | untyped | - | - | |
| verbose | integer | 1 | [0, 2] | |
| watchlist | untyped | NULL | - | |
| xgb_model | untyped | NULL | - | |
Early Stopping and Validation
In order to monitor the validation performance during the training, you can set the $validate field of the Learner.
For information on how to configure the valdiation set, see the Validation section of mlr3::Learner.
This validation data can also be used for early stopping, which can be enabled by setting the early_stopping_rounds parameter.
The final (or in the case of early stopping best) validation scores can be accessed via $internal_valid_scores, and the
optimal nrounds via $internal_tuned_values.
Initial parameter values
-  nrounds:- Actual default: no default. 
- Adjusted default: 1. 
- Reason for change: Without a default construction of the learner would error. Just setting a nonsense default to workaround this. - nroundsneeds to be tuned by the user.
 
-  nthread:- Actual value: Undefined, triggering auto-detection of the number of CPUs. 
- Adjusted value: 1. 
- Reason for change: Conflicting with parallelization via future. 
 
-  verbose:- Actual default: 1. 
- Adjusted default: 0. 
- Reason for change: Reduce verbosity. 
 
Super classes
mlr3::Learner -> mlr3::LearnerRegr -> LearnerRegrXgboost
Active bindings
- internal_valid_scores
- (named - list()or- NULL) The validation scores extracted from- model$evaluation_log. If early stopping is activated, this contains the validation scores of the model for the optimal- nrounds, otherwise the- nroundsfor the final model.
- internal_tuned_values
- (named - list()or- NULL) If early stopping is activated, this returns a list with- nrounds, which is extracted from- $best_iterationof the model and otherwise- NULL.
- validate
- ( - numeric(1)or- character(1)or- NULL) How to construct the internal validation data. This parameter can be either- NULL, a ratio,- "test", or- "predefined". Returns the- $best_iterationwhen early stopping is activated.
Methods
Public methods
Inherited methods
Method new()
Creates a new instance of this R6 class.
Usage
LearnerRegrXgboost$new()
Method importance()
The importance scores are calculated with xgboost::xgb.importance().
Usage
LearnerRegrXgboost$importance()
Returns
Named numeric().
Method clone()
The objects of this class are cloneable with this method.
Usage
LearnerRegrXgboost$clone(deep = FALSE)
Arguments
- deep
- Whether to make a deep clone. 
Note
To compute on GPUs, you first need to compile xgboost yourself and link against CUDA. See https://xgboost.readthedocs.io/en/stable/build.html#building-with-gpu-support.
References
Chen, Tianqi, Guestrin, Carlos (2016). “Xgboost: A scalable tree boosting system.” In Proceedings of the 22nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 785–794. ACM. doi:10.1145/2939672.2939785.
See Also
- Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-learners 
- Package mlr3extralearners for more learners. 
-  as.data.table(mlr_learners)for a table of available Learners in the running session (depending on the loaded packages).
-  mlr3pipelines to combine learners with pre- and postprocessing steps. 
- Extension packages for additional task types: -  mlr3proba for probabilistic supervised regression and survival analysis. 
-  mlr3cluster for unsupervised clustering. 
 
-  
-  mlr3tuning for tuning of hyperparameters, mlr3tuningspaces for established default tuning spaces. 
Other Learner: 
mlr_learners_classif.cv_glmnet,
mlr_learners_classif.glmnet,
mlr_learners_classif.kknn,
mlr_learners_classif.lda,
mlr_learners_classif.log_reg,
mlr_learners_classif.multinom,
mlr_learners_classif.naive_bayes,
mlr_learners_classif.nnet,
mlr_learners_classif.qda,
mlr_learners_classif.ranger,
mlr_learners_classif.svm,
mlr_learners_classif.xgboost,
mlr_learners_regr.cv_glmnet,
mlr_learners_regr.glmnet,
mlr_learners_regr.kknn,
mlr_learners_regr.km,
mlr_learners_regr.lm,
mlr_learners_regr.nnet,
mlr_learners_regr.ranger,
mlr_learners_regr.svm
Examples
## Not run: 
if (requireNamespace("xgboost", quietly = TRUE)) {
# Define the Learner and set parameter values
learner = lrn("regr.xgboost")
print(learner)
# Define a Task
task = tsk("mtcars")
# Create train and test set
ids = partition(task)
# Train the learner on the training ids
learner$train(task, row_ids = ids$train)
# print the model
print(learner$model)
# importance method
if("importance" %in% learner$properties) print(learner$importance)
# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)
# Score the predictions
predictions$score()
}
## End(Not run)
## Not run: 
# Train learner with early stopping on spam data set
task = tsk("mtcars")
# use 30 percent for validation
# Set early stopping parameter
learner = lrn("regr.xgboost",
  nrounds = 100,
  early_stopping_rounds = 10,
  validate = 0.3
)
# Train learner with early stopping
learner$train(task)
# Inspect optimal nrounds and validation performance
learner$internal_tuned_values
learner$internal_valid_scores
## End(Not run)