| xgb.cv {xgboost} | R Documentation |
Cross Validation
Description
The cross validation function of xgboost
Usage
xgb.cv(
params = list(),
data,
nrounds,
nfold,
label = NULL,
missing = NA,
prediction = FALSE,
showsd = TRUE,
metrics = list(),
obj = NULL,
feval = NULL,
stratified = TRUE,
folds = NULL,
train_folds = NULL,
verbose = TRUE,
print_every_n = 1L,
early_stopping_rounds = NULL,
maximize = NULL,
callbacks = list(),
...
)
Arguments
params |
the list of parameters. The complete list of parameters is available in the online documentation. Below is a shorter summary:
See |
data |
takes an |
nrounds |
the max number of iterations |
nfold |
the original dataset is randomly partitioned into |
label |
vector of response values. Should be provided only when data is an R-matrix. |
missing |
is only used when input is a dense matrix. By default is set to NA, which means that NA values should be considered as 'missing' by the algorithm. Sometimes, 0 or other extreme value might be used to represent missing values. |
prediction |
A logical value indicating whether to return the test fold predictions
from each CV model. This parameter engages the |
showsd |
|
metrics |
list of evaluation metrics to be used in cross validation, when it is not specified, the evaluation metric is chosen according to objective function. Possible options are:
|
obj |
customized objective function. Returns gradient and second order gradient with given prediction and dtrain. |
feval |
customized evaluation function. Returns
|
stratified |
a |
folds |
|
train_folds |
|
verbose |
|
print_every_n |
Print each n-th iteration evaluation messages when |
early_stopping_rounds |
If |
maximize |
If |
callbacks |
a list of callback functions to perform various task during boosting.
See |
... |
other parameters to pass to |
Details
The original sample is randomly partitioned into nfold equal size subsamples.
Of the nfold subsamples, a single subsample is retained as the validation data for testing the model, and the remaining nfold - 1 subsamples are used as training data.
The cross-validation process is then repeated nrounds times, with each of the nfold subsamples used exactly once as the validation data.
All observations are used for both training and validation.
Adapted from https://en.wikipedia.org/wiki/Cross-validation_%28statistics%29
Value
An object of class xgb.cv.synchronous with the following elements:
-
calla function call. -
paramsparameters that were passed to the xgboost library. Note that it does not capture parameters changed by thecb.reset.parameterscallback. -
callbackscallback functions that were either automatically assigned or explicitly passed. -
evaluation_logevaluation history stored as adata.tablewith the first column corresponding to iteration number and the rest corresponding to the CV-based evaluation means and standard deviations for the training and test CV-sets. It is created by thecb.evaluation.logcallback. -
niternumber of boosting iterations. -
nfeaturesnumber of features in training data. -
foldsthe list of CV folds' indices - either those passed through thefoldsparameter or randomly generated. -
best_iterationiteration number with the best evaluation metric value (only available with early stopping). -
best_ntreelimitand thentreelimitDeprecated attributes, usebest_iterationinstead. -
predCV prediction values available whenpredictionis set. It is either vector or matrix (seecb.cv.predict). -
modelsa list of the CV folds' models. It is only available with the explicit setting of thecb.cv.predict(save_models = TRUE)callback.
Examples
data(agaricus.train, package='xgboost')
dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
cv <- xgb.cv(data = dtrain, nrounds = 3, nthread = 2, nfold = 5, metrics = list("rmse","auc"),
max_depth = 3, eta = 1, objective = "binary:logistic")
print(cv)
print(cv, verbose=TRUE)