| xgb.train {xgboost} | R Documentation | 
eXtreme Gradient Boosting Training
Description
xgb.train is an advanced interface for training an xgboost model.
The xgboost function is a simpler wrapper for xgb.train.
Usage
xgb.train(
  params = list(),
  data,
  nrounds,
  watchlist = list(),
  obj = NULL,
  feval = NULL,
  verbose = 1,
  print_every_n = 1L,
  early_stopping_rounds = NULL,
  maximize = NULL,
  save_period = NULL,
  save_name = "xgboost.model",
  xgb_model = NULL,
  callbacks = list(),
  ...
)
xgboost(
  data = NULL,
  label = NULL,
  missing = NA,
  weight = NULL,
  params = list(),
  nrounds,
  verbose = 1,
  print_every_n = 1L,
  early_stopping_rounds = NULL,
  maximize = NULL,
  save_period = NULL,
  save_name = "xgboost.model",
  xgb_model = NULL,
  callbacks = list(),
  ...
)
Arguments
| params | the list of parameters. The complete list of parameters is available in the online documentation. Below is a shorter summary: 1. General Parameters 
 2. Booster Parameters 2.1. Parameters for Tree Booster 
 2.2. Parameters for Linear Booster 
 3. Task Parameters 
 | 
| data | training dataset.  | 
| nrounds | max number of boosting iterations. | 
| watchlist | named list of xgb.DMatrix datasets to use for evaluating model performance.
Metrics specified in either  | 
| obj | customized objective function. Returns gradient and second order gradient with given prediction and dtrain. | 
| feval | customized evaluation function. Returns
 | 
| verbose | If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting  | 
| print_every_n | Print each n-th iteration evaluation messages when  | 
| early_stopping_rounds | If  | 
| maximize | If  | 
| save_period | when it is non-NULL, model is saved to disk after every  | 
| save_name | the name or path for periodically saved model file. | 
| xgb_model | a previously built model to continue the training from.
Could be either an object of class  | 
| callbacks | a list of callback functions to perform various task during boosting.
See  | 
| ... | other parameters to pass to  | 
| label | vector of response values. Should not be provided when data is
a local data file name or an  | 
| missing | by default is set to NA, which means that NA values should be considered as 'missing' by the algorithm. Sometimes, 0 or other extreme value might be used to represent missing values. This parameter is only used when input is a dense matrix. | 
| weight | a vector indicating the weight for each row of the input. | 
Details
These are the training functions for xgboost.
The xgb.train interface supports advanced features such as watchlist,
customized objective and evaluation metric functions, therefore it is more flexible
than the xgboost interface.
Parallelization is automatically enabled if OpenMP is present.
Number of threads can also be manually specified via the nthread
parameter.
The evaluation metric is chosen automatically by XGBoost (according to the objective)
when the eval_metric parameter is not provided.
User may set one or several eval_metric parameters.
Note that when using a customized metric, only this single metric can be used.
The following is the list of built-in metrics for which XGBoost provides optimized implementation:
-  rmseroot mean square error. https://en.wikipedia.org/wiki/Root_mean_square_error
-  loglossnegative log-likelihood. https://en.wikipedia.org/wiki/Log-likelihood
-  mloglossmulticlass logloss. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html
-  errorBinary classification error rate. It is calculated as(# wrong cases) / (# all cases). By default, it uses the 0.5 threshold for predicted values to define negative and positive instances. Different threshold (e.g., 0.) could be specified as "error@0."
-  merrorMulticlass classification error rate. It is calculated as(# wrong cases) / (# all cases).
-  maeMean absolute error
-  mapeMean absolute percentage error
-  aucArea under the curve. https://en.wikipedia.org/wiki/Receiver_operating_characteristic#'Area_under_curve for ranking evaluation.
-  aucprArea under the PR curve. https://en.wikipedia.org/wiki/Precision_and_recall for ranking evaluation.
-  ndcgNormalized Discounted Cumulative Gain (for ranking task). https://en.wikipedia.org/wiki/NDCG
The following callbacks are automatically created when certain parameters are set:
-  cb.print.evaluationis turned on whenverbose > 0; and theprint_every_nparameter is passed to it.
-  cb.evaluation.logis on whenwatchlistis present.
-  cb.early.stop: whenearly_stopping_roundsis set.
-  cb.save.model: whensave_period > 0is set.
Value
An object of class xgb.Booster with the following elements:
-  handlea handle (pointer) to the xgboost model in memory.
-  rawa cached memory dump of the xgboost model saved as R'srawtype.
-  niternumber of boosting iterations.
-  evaluation_logevaluation history stored as adata.tablewith the first column corresponding to iteration number and the rest corresponding to evaluation metrics' values. It is created by thecb.evaluation.logcallback.
-  calla function call.
-  paramsparameters that were passed to the xgboost library. Note that it does not capture parameters changed by thecb.reset.parameterscallback.
-  callbackscallback functions that were either automatically assigned or explicitly passed.
-  best_iterationiteration number with the best evaluation metric value (only available with early stopping).
-  best_scorethe best evaluation metric value during early stopping. (only available with early stopping).
-  feature_namesnames of the training dataset features (only when column names were defined in training data).
-  nfeaturesnumber of features in training data.
References
Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754
See Also
callbacks,
predict.xgb.Booster,
xgb.cv
Examples
data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
## Keep the number of threads to 1 for examples
nthread <- 1
data.table::setDTthreads(nthread)
dtrain <- with(
  agaricus.train, xgb.DMatrix(data, label = label, nthread = nthread)
)
dtest <- with(
  agaricus.test, xgb.DMatrix(data, label = label, nthread = nthread)
)
watchlist <- list(train = dtrain, eval = dtest)
## A simple xgb.train example:
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = nthread,
              objective = "binary:logistic", eval_metric = "auc")
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
## An xgb.train example where custom objective and evaluation metric are
## used:
logregobj <- function(preds, dtrain) {
   labels <- getinfo(dtrain, "label")
   preds <- 1/(1 + exp(-preds))
   grad <- preds - labels
   hess <- preds * (1 - preds)
   return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
}
# These functions could be used by passing them either:
#  as 'objective' and 'eval_metric' parameters in the params list:
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = nthread,
              objective = logregobj, eval_metric = evalerror)
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist)
#  or through the ... arguments:
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = nthread)
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
                 objective = logregobj, eval_metric = evalerror)
#  or as dedicated 'obj' and 'feval' parameters of xgb.train:
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
                 obj = logregobj, feval = evalerror)
## An xgb.train example of using variable learning rates at each iteration:
param <- list(max_depth = 2, eta = 1, verbose = 0, nthread = nthread,
              objective = "binary:logistic", eval_metric = "auc")
my_etas <- list(eta = c(0.5, 0.1))
bst <- xgb.train(param, dtrain, nrounds = 2, watchlist,
                 callbacks = list(cb.reset.parameters(my_etas)))
## Early stopping:
bst <- xgb.train(param, dtrain, nrounds = 25, watchlist,
                 early_stopping_rounds = 3)
## An 'xgboost' interface example:
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label,
               max_depth = 2, eta = 1, nthread = nthread, nrounds = 2,
               objective = "binary:logistic")
pred <- predict(bst, agaricus.test$data)