R: Obtain and format results produced by tuning functions for...

collect_metrics.workflow_set {workflowsets}

R Documentation

Obtain and format results produced by tuning functions for workflow sets

Description

Return a tibble of performance metrics for all models or submodels.

Usage

## S3 method for class 'workflow_set'
collect_metrics(x, ..., summarize = TRUE)

## S3 method for class 'workflow_set'
collect_predictions(
  x,
  ...,
  summarize = TRUE,
  parameters = NULL,
  select_best = FALSE,
  metric = NULL
)

## S3 method for class 'workflow_set'
collect_notes(x, ...)

Arguments

`x`	A `workflow_set` object that has been evaluated with `workflow_map()`.
`...`	Not currently used.
`summarize`	A logical for whether the performance estimates should be summarized via the mean (over resamples) or the raw performance values (per resample) should be returned along with the resampling identifiers. When collecting predictions, these are averaged if multiple assessment sets contain the same row.
`parameters`	An optional tibble of tuning parameter values that can be used to filter the predicted values before processing. This tibble should only have columns for each tuning parameter identifier (e.g. `"my_param"` if `tune("my_param")` was used).
`select_best`	A single logical for whether the numerically best results are retained. If `TRUE`, the `parameters` argument is ignored.
`metric`	A character string for the metric that is used for `select_best`.

Details

When applied to a workflow set, the metrics and predictions that are returned do not contain the actual tuning parameter columns and values (unlike when these collect functions are run on other objects). The reason is that workflow sets can contain different types of models or models with different tuning parameters.

If the columns are needed, there are two options. First, the .config column can be used to merge the tuning parameter columns into an appropriate object. Alternatively, the map() function can be used to get the metrics from the original objects (see the example below).

Value

A tibble.

Note

The package supplies two pre-generated workflow sets, two_class_set and chi_features_set, and associated sets of model fits two_class_res and chi_features_res.

The ⁠two_class_*⁠ objects are based on a binary classification problem using the two_class_dat data from the modeldata package. The six models utilize either a bare formula or a basic recipe utilizing recipes::step_YeoJohnson() as a preprocessor, and a decision tree, logistic regression, or MARS model specification. See ?two_class_set for source code.

The ⁠chi_features_*⁠ objects are based on a regression problem using the Chicago data from the modeldata package. Each of the three models utilize a linear regression model specification, with three different recipes of varying complexity. The objects are meant to approximate the sequence of models built in Section 1.3 of Kuhn and Johnson (2019). See ?chi_features_set for source code.

Examples

library(dplyr)
library(purrr)
library(tidyr)

two_class_res

# ------------------------------------------------------------------------------

collect_metrics(two_class_res)

# Alternatively, if the tuning parameter values are needed:
two_class_res %>%
  dplyr::filter(grepl("cart", wflow_id)) %>%
  mutate(metrics = map(result, collect_metrics)) %>%
  dplyr::select(wflow_id, metrics) %>%
  tidyr::unnest(cols = metrics)


collect_metrics(two_class_res, summarize = FALSE)

[Package workflowsets version 1.1.0 Index]