R: Fit a model to the numerically optimal configuration

fit_best.workflow_set {workflowsets}

R Documentation

Fit a model to the numerically optimal configuration

Description

fit_best() takes results from tuning many models and fits the workflow configuration associated with the best performance to the training set.

Usage

## S3 method for class 'workflow_set'
fit_best(x, metric = NULL, eval_time = NULL, ...)

Arguments

`x`	A `workflow_set` object that has been evaluated with `workflow_map()`. Note that the workflow set must have been fitted with the control option `save_workflow = TRUE`.
`metric`	A character string giving the metric to rank results by.
`eval_time`	A single numeric time point where dynamic event time metrics should be chosen (e.g., the time-dependent ROC curve, etc). The values should be consistent with the values used to create `x`. The `NULL` default will automatically use the first evaluation time used by `x`.
`...`	Additional options to pass to tune::fit_best.

Details

This function is a shortcut for the steps needed to fit the numerically optimal configuration in a fitted workflow set. The function ranks results, extracts the tuning result pertaining to the best result, and then again calls fit_best() (itself a wrapper) on the tuning result containing the best result.

In pseudocode:

rankings <- rank_results(wf_set, metric, select_best = TRUE)
tune_res <- extract_workflow_set_result(wf_set, rankings$wflow_id[1])
fit_best(tune_res, metric)

Note

The package supplies two pre-generated workflow sets, two_class_set and chi_features_set, and associated sets of model fits two_class_res and chi_features_res.

The ⁠two_class_*⁠ objects are based on a binary classification problem using the two_class_dat data from the modeldata package. The six models utilize either a bare formula or a basic recipe utilizing recipes::step_YeoJohnson() as a preprocessor, and a decision tree, logistic regression, or MARS model specification. See ?two_class_set for source code.

The ⁠chi_features_*⁠ objects are based on a regression problem using the Chicago data from the modeldata package. Each of the three models utilize a linear regression model specification, with three different recipes of varying complexity. The objects are meant to approximate the sequence of models built in Section 1.3 of Kuhn and Johnson (2019). See ?chi_features_set for source code.

Examples


library(tune)
library(modeldata)
library(rsample)

data(Chicago)
Chicago <- Chicago[1:1195,]

time_val_split <-
   sliding_period(
      Chicago,
      date,
      "month",
      lookback = 38,
      assess_stop = 1
   )

chi_features_set

chi_features_res_new <-
   chi_features_set %>%
   # note: must set `save_workflow = TRUE` to use `fit_best()`
   option_add(control = control_grid(save_workflow = TRUE)) %>%
   # evaluate with resamples
   workflow_map(resamples = time_val_split, grid = 21, seed = 1, verbose = TRUE)

chi_features_res_new

# sort models by performance metrics
rank_results(chi_features_res_new)

# fit the numerically optimal configuration to the training set
chi_features_wf <- fit_best(chi_features_res_new)

chi_features_wf

# to select optimal value based on a specific metric:
fit_best(chi_features_res_new, metric = "rmse")

[Package workflowsets version 1.1.0 Index]