mixgb_cv {mixgb}R Documentation

Use cross-validation to find the optimal nrounds

Description

Use cross-validation to find the optimal nrounds for an Mixgb imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds.

Usage

mixgb_cv(
  data,
  nfold = 5,
  nrounds = 100,
  early_stopping_rounds = 10,
  response = NULL,
  select_features = NULL,
  xgb.params = list(max_depth = 3, gamma = 0, eta = 0.3, min_child_weight = 1,
    subsample = 0.7, colsample_bytree = 1, colsample_bylevel = 1, colsample_bynode = 1,
    tree_method = "auto", gpu_id = 0, predictor = "auto"),
  stringsAsFactors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

data

A data.frame or a data.table with missing values.

nfold

The number of subsamples which are randomly partitioned and of equal size. Default: 5

nrounds

The max number of iterations in XGBoost training. Default: 100

early_stopping_rounds

An integer value k. Training will stop if the validation performance has not improved for k rounds.

response

The name or the column index of a response variable. Default: NULL (Randomly select an incomplete variable).

select_features

The names or the indices of selected features. Default: NULL (Select all the other variables in the dataset).

xgb.params

A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.

stringsAsFactors

A logical value indicating whether all character vectors in the dataset should be converted to factors.

verbose

A logical value. Whether to print out cross-validation results during the process.

...

Extra arguments to be passed to XGBoost.

Value

A list of the optimal nrounds, evaluation.log and the chosen response.

Examples

params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
cv.results$best.nrounds

imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params, nrounds = cv.results$best.nrounds)

[Package mixgb version 1.0.2 Index]