R: auto_tune

auto_tune_xgboost {autostats}

R Documentation

auto_tune_xgboost

Description

Automatically tunes an xgboost model using grid or bayesian optimization

Usage

auto_tune_xgboost(
  .data,
  formula,
  tune_method = c("grid", "bayes"),
  event_level = c("first", "second"),
  n_fold = 5L,
  n_iter = 100L,
  seed = 1,
  save_output = FALSE,
  parallel = TRUE,
  trees = tune::tune(),
  min_n = tune::tune(),
  mtry = tune::tune(),
  tree_depth = tune::tune(),
  learn_rate = tune::tune(),
  loss_reduction = tune::tune(),
  sample_size = tune::tune(),
  stop_iter = tune::tune(),
  counts = FALSE,
  tree_method = c("auto", "exact", "approx", "hist", "gpu_hist"),
  monotone_constraints = 0L,
  num_parallel_tree = 1L,
  lambda = 1,
  alpha = 0,
  scale_pos_weight = 1,
  verbosity = 0L
)

Arguments

`.data`	dataframe
`formula`	formula
`tune_method`	method of tuning. defaults to grid
`event_level`	for binary classification, which factor level is the positive class. specify "second" for second level
`n_fold`	integer. n folds in resamples
`n_iter`	n iterations for tuning (bayes); paramter grid size (grid)
`seed`	seed
`save_output`	FASLE. If set to TRUE will write the output as an rds file
`parallel`	default TRUE; If set to TRUE, will enable parallel processing on resamples for grid tuning
`trees`	# Trees (xgboost: nrounds) (type: integer, default: 500L)
`min_n`	Minimal Node Size (xgboost: min_child_weight) (type: integer, default: 2L); [typical range: 2-10] Keep small value for highly imbalanced class data where leaf nodes can have smaller size groups. Otherwise increase size to prevent overfitting outliers.
`mtry`	# Randomly Selected Predictors; defaults to .75; (xgboost: colsample_bynode) (type: numeric, range 0 - 1) (or type: integer if `count = TRUE`)
`tree_depth`	Tree Depth (xgboost: max_depth) (type: integer, default: 7L); Typical values: 3-10
`learn_rate`	Learning Rate (xgboost: eta) (type: double, default: 0.05); Typical values: 0.01-0.3
`loss_reduction`	Minimum Loss Reduction (xgboost: gamma) (type: double, default: 1.0); range: 0 to Inf; typical value: 0 - 20 assuming low-mid tree depth
`sample_size`	Proportion Observations Sampled (xgboost: subsample) (type: double, default: .75); Typical values: 0.5 - 1
`stop_iter`	# Iterations Before Stopping (xgboost: early_stop) (type: integer, default: 15L) only enabled if validation set is provided
`counts`	if `TRUE` specify `mtry` as an integer number of cols. Default `FALSE` to specify `mtry` as fraction of cols from 0 to 1
`tree_method`	xgboost tree_method. default is `auto`. reference: tree method docs
`monotone_constraints`	an integer vector with length of the predictor cols, of `-1, 1, 0` corresponding to decreasing, increasing, and no constraint respectively for the index of the predictor col. reference: monotonicity docs.
`num_parallel_tree`	should be set to the size of the forest being trained. default 1L
`lambda`	[default=.5] L2 regularization term on weights. Increasing this value will make model more conservative.
`alpha`	[default=.1] L1 regularization term on weights. Increasing this value will make model more conservative.
`scale_pos_weight`	[default=1] Control the balance of positive and negative weights, useful for unbalanced classes. if set to TRUE, calculates sum(negative instances) / sum(positive instances). If first level is majority class, use values < 1, otherwise normally values >1 are used to balance the class distribution.
`verbosity`	[default=1] Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3 (debug).

Details

Default is to tune all 7 xgboost parameters. Individual parameter values can be optionally fixed to reduce tuning complexity.

Value

workflow object

Examples





iris %>%
 framecleaner::create_dummies() -> iris1

iris1 %>%
 tidy_formula(target = Petal.Length) -> petal_form

iris1 %>%
 rsample::initial_split() -> iris_split

iris_split %>%
 rsample::analysis() -> iris_train

iris_split %>%
 rsample::assessment() -> iris_val

## Not run: 
iris_train %>%
 auto_tune_xgboost(formula = petal_form, n_iter = 10,
 parallel = FALSE, tune_method = "grid", mtry = .5) -> xgb_tuned

xgb_tuned %>%
 parsnip::fit(iris_train) %>%
 parsnip::extract_fit_engine() -> xgb_tuned_fit

xgb_tuned_fit %>%
 tidy_predict(newdata = iris_val, form = petal_form) -> iris_val1

## End(Not run)

[Package autostats version 0.4.1 Index]