R: Benchmark Multiple Learners on Multiple Tasks

benchmark {mlr3}

R Documentation

Benchmark Multiple Learners on Multiple Tasks

Description

Runs a benchmark on arbitrary combinations of tasks (Task), learners (Learner), and resampling strategies (Resampling), possibly in parallel.

Usage

benchmark(
  design,
  store_models = FALSE,
  store_backends = TRUE,
  encapsulate = NA_character_,
  allow_hotstart = FALSE,
  clone = c("task", "learner", "resampling"),
  unmarshal = TRUE
)

Arguments

`design`	(`data.frame()`) Data frame (or `data.table::data.table()`) with three columns: "task", "learner", and "resampling". Each row defines a resampling by providing a Task, Learner and an instantiated Resampling strategy. The helper function `benchmark_grid()` can assist in generating an exhaustive design (see examples) and instantiate the Resamplings per Task. Additionally, you can set the additional column 'param_values', see `benchmark_grid()`.
`store_models`	(`logical(1)`) Store the fitted model in the resulting object= Set to `TRUE` if you want to further analyse the models or want to extract information like variable importance.
`store_backends`	(`logical(1)`) Keep the DataBackend of the Task in the ResampleResult? Set to `TRUE` if your performance measures require a Task, or to analyse results more conveniently. Set to `FALSE` to reduce the file size and memory footprint after serialization. The current default is `TRUE`, but this eventually will be changed in a future release.
`encapsulate`	(`character(1)`) If not `NA`, enables encapsulation by setting the field `Learner$encapsulate` to one of the supported values: `"none"` (disable encapsulation), `"try"` (captures errors but output is printed to the console and not logged), `"evaluate"` (execute via evaluate) and `"callr"` (start in external session via callr). If `NA`, encapsulation is not changed, i.e. the settings of the individual learner are active. Additionally, if encapsulation is set to `"evaluate"` or `"callr"`, the fallback learner is set to the featureless learner if the learner does not already have a fallback configured.
`allow_hotstart`	(`logical(1)`) Determines if learner(s) are hot started with trained models in `⁠$hotstart_stack⁠`. See also HotstartStack.
`clone`	(`character()`) Select the input objects to be cloned before proceeding by providing a set with possible values `"task"`, `"learner"` and `"resampling"` for Task, Learner and Resampling, respectively. Per default, all input objects are cloned.
`unmarshal`	`Learner` Whether to unmarshal learners that were marshaled during the execution. If `TRUE` all models are stored in unmarshaled form. If `FALSE`, all learners (that need marshaling) are stored in marshaled form.

Value

BenchmarkResult.

Predict Sets

If you want to compare the performance of a learner on the training with the performance on the test set, you have to configure the Learner to predict on multiple sets by setting the field predict_sets to c("train", "test") (default is "test"). Each set yields a separate Prediction object during resampling. In the next step, you have to configure the measures to operate on the respective Prediction object:

m1 = msr("classif.ce", id = "ce.train", predict_sets = "train")
m2 = msr("classif.ce", id = "ce.test", predict_sets = "test")

The (list of) created measures can finally be passed to ⁠$aggregate()⁠ or ⁠$score()⁠.

Parallelization

This function can be parallelized with the future package. One job is one resampling iteration, and all jobs are send to an apply function from future.apply in a single batch. To select a parallel backend, use future::plan().

Progress Bars

This function supports progress bars via the package progressr. Simply wrap the function call in progressr::with_progress() to enable them. Alternatively, call progressr::handlers() with global = TRUE to enable progress bars globally. We recommend the progress package as backend which can be enabled with progressr::handlers("progress").

Logging

The mlr3 uses the lgr package for logging. lgr supports multiple log levels which can be queried with getOption("lgr.log_levels").

To suppress output and reduce verbosity, you can lower the log from the default level "info" to "warn":

lgr::get_logger("mlr3")$set_threshold("warn")

To get additional log output for debugging, increase the log level to "debug" or "trace":

lgr::get_logger("mlr3")$set_threshold("debug")

To log to a file or a data base, see the documentation of lgr::lgr-package.

Note

The fitted models are discarded after the predictions have been scored in order to reduce memory consumption. If you need access to the models for later analysis, set store_models to TRUE.

Examples

# benchmarking with benchmark_grid()
tasks = lapply(c("penguins", "sonar"), tsk)
learners = lapply(c("classif.featureless", "classif.rpart"), lrn)
resamplings = rsmp("cv", folds = 3)

design = benchmark_grid(tasks, learners, resamplings)
print(design)

set.seed(123)
bmr = benchmark(design)

## Data of all resamplings
head(as.data.table(bmr))

## Aggregated performance values
aggr = bmr$aggregate()
print(aggr)

## Extract predictions of first resampling result
rr = aggr$resample_result[[1]]
as.data.table(rr$prediction())

# Benchmarking with a custom design:
# - fit classif.featureless on penguins with a 3-fold CV
# - fit classif.rpart on sonar using a holdout
tasks = list(tsk("penguins"), tsk("sonar"))
learners = list(lrn("classif.featureless"), lrn("classif.rpart"))
resamplings = list(rsmp("cv", folds = 3), rsmp("holdout"))

design = data.table::data.table(
  task = tasks,
  learner = learners,
  resampling = resamplings
)

## Instantiate resamplings
design$resampling = Map(
  function(task, resampling) resampling$clone()$instantiate(task),
  task = design$task, resampling = design$resampling
)

## Run benchmark
bmr = benchmark(design)
print(bmr)

## Get the training set of the 2nd iteration of the featureless learner on penguins
rr = bmr$aggregate()[learner_id == "classif.featureless"]$resample_result[[1]]
rr$resampling$train_set(2)

[Package mlr3 version 0.20.2 Index]