BenchmarkResult {mlr3} | R Documentation |
Container for Benchmarking Results
Description
This is the result container object returned by benchmark()
.
A BenchmarkResult consists of the data of multiple
ResampleResults.
BenchmarkResults can be visualized via mlr3viz's autoplot()
function.
For statistical analysis of benchmark results and more advanced plots, see mlr3benchmark.
S3 Methods
-
as.data.table(rr, ..., reassemble_learners = TRUE, convert_predictions = TRUE, predict_sets = "test")
BenchmarkResult ->data.table::data.table()
Returns a tabular view of the internal data. -
c(...)
(BenchmarkResult, ...) -> BenchmarkResult
Combines multiple objects convertible to BenchmarkResult into a new BenchmarkResult.
Active bindings
task_type
(
character(1)
)
Task type of objects in theBenchmarkResult
. All stored objects (Task, Learner, Prediction) in a singleBenchmarkResult
are required to have the same task type, e.g.,"classif"
or"regr"
. This isNA
for empty BenchmarkResults.tasks
(
data.table::data.table()
)
Table of included Tasks with three columns:-
"task_hash"
(character(1)
), -
"task_id"
(character(1)
), and -
"task"
(Task).
-
learners
(
data.table::data.table()
)
Table of included Learners with three columns:-
"learner_hash"
(character(1)
), -
"learner_id"
(character(1)
), and -
"learner"
(Learner).
Note that it is not feasible to access learned models via this field, as the training task would be ambiguous. For this reason the returned learner are reset before they are returned. Instead, select a row from the table returned by
$score()
.-
resamplings
(
data.table::data.table()
)
Table of included Resamplings with three columns:-
"resampling_hash"
(character(1)
), -
"resampling_id"
(character(1)
), and -
"resampling"
(Resampling).
-
resample_results
(
data.table::data.table()
)
Returns a table with three columns:-
uhash
(character()
). -
resample_result
(ResampleResult).
-
n_resample_results
(
integer(1)
)
Returns the total number of stored ResampleResults.uhashes
(
character()
)
Set of (unique) hashes of all included ResampleResults.
Methods
Public methods
Method new()
Creates a new instance of this R6 class.
Usage
BenchmarkResult$new(data = NULL)
Arguments
data
(
ResultData
)
An object of typeResultData
, either extracted from another ResampleResult, another BenchmarkResult, or manually constructed withas_result_data()
.
Method help()
Opens the help page for this object.
Usage
BenchmarkResult$help()
Method format()
Helper for print outputs.
Usage
BenchmarkResult$format(...)
Arguments
...
(ignored).
Method print()
Printer.
Usage
BenchmarkResult$print()
Method combine()
Fuses a second BenchmarkResult into itself, mutating the BenchmarkResult in-place.
If the second BenchmarkResult bmr
is NULL
, simply returns self
.
Note that you can alternatively use the combine function c()
which calls this method internally.
Usage
BenchmarkResult$combine(bmr)
Arguments
bmr
(BenchmarkResult)
A second BenchmarkResult object.
Returns
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keep
the object in its previous state.
Method marshal()
Marshals all stored models.
Usage
BenchmarkResult$marshal(...)
Arguments
...
(any)
Additional arguments passed tomarshal_model()
.
Method unmarshal()
Unmarshals all stored models.
Usage
BenchmarkResult$unmarshal(...)
Arguments
...
(any)
Additional arguments passed tounmarshal_model()
.
Method score()
Returns a table with one row for each resampling iteration, including
all involved objects: Task, Learner, Resampling, iteration number
(integer(1)
), and Prediction. If ids
is set to TRUE
, character
column of extracted ids are added to the table for convenient
filtering: "task_id"
, "learner_id"
, and "resampling_id"
.
Additionally calculates the provided performance measures and binds the performance scores as extra columns. These columns are named using the id of the respective Measure.
Usage
BenchmarkResult$score( measures = NULL, ids = TRUE, conditions = FALSE, predict_sets = "test" )
Arguments
measures
ids
(
logical(1)
)
Adds object ids ("task_id"
,"learner_id"
,"resampling_id"
) as extra character columns to the returned table.conditions
(
logical(1)
)
Adds condition messages ("warnings"
,"errors"
) as extra list columns of character vectors to the returned tablepredict_sets
(
character()
)
Prediction sets to operate on, used inaggregate()
to extract the matchingpredict_sets
from the ResampleResult. Multiple predict sets are calculated by the respective Learner duringresample()
/benchmark()
. Must be a non-empty subset of{"train", "test", "internal_valid"}
. If multiple sets are provided, these are first combined to a single prediction object. Default is"test"
.
Returns
Method aggregate()
Returns a result table where resampling iterations are combined into ResampleResults. A column with the aggregated performance score is added for each Measure, named with the id of the respective measure.
The method for aggregation is controlled by the Measure, e.g. micro aggregation, macro aggregation or custom aggregation. Most measures default to macro aggregation.
Note that the aggregated performances just give a quick impression which approaches work well and which approaches are probably underperforming. However, the aggregates do not account for variance and cannot replace a statistical test. See mlr3viz to get a better impression via boxplots or mlr3benchmark for critical difference plots and significance tests.
For convenience, different flags can be set to extract more information from the returned ResampleResult.
Usage
BenchmarkResult$aggregate( measures = NULL, ids = TRUE, uhashes = FALSE, params = FALSE, conditions = FALSE )
Arguments
measures
ids
(
logical(1)
)
Adds object ids ("task_id"
,"learner_id"
,"resampling_id"
) as extra character columns for convenient subsetting.uhashes
(
logical(1)
)
Adds the uhash values of the ResampleResult as extra character column"uhash"
.params
(
logical(1)
)
Adds the hyperparameter values as extra list column"params"
. You can unnest them withmlr3misc::unnest()
.conditions
(
logical(1)
)
Adds the number of resampling iterations with at least one warning as extra integer column"warnings"
, and the number of resampling iterations with errors as extra integer column"errors"
.
Returns
Method filter()
Subsets the benchmark result. If task_ids
is not NULL
, keeps all
tasks with provided task ids and discards all others tasks.
Same procedure for learner_ids
and resampling_ids
.
Usage
BenchmarkResult$filter( task_ids = NULL, task_hashes = NULL, learner_ids = NULL, learner_hashes = NULL, resampling_ids = NULL, resampling_hashes = NULL )
Arguments
task_ids
(
character()
)
Ids of Tasks to keep.task_hashes
(
character()
)
Hashes of Tasks to keep.learner_ids
(
character()
)
Ids of Learners to keep.learner_hashes
(
character()
)
Hashes of Learners to keep.resampling_ids
(
character()
)
Ids of Resamplings to keep.resampling_hashes
(
character()
)
Hashes of Resamplings to keep.
Returns
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
Method resample_result()
Retrieve the i-th ResampleResult, by position or by unique hash uhash
.
i
and uhash
are mutually exclusive.
Usage
BenchmarkResult$resample_result(i = NULL, uhash = NULL)
Arguments
i
(
integer(1)
)
The iteration value to filter for.uhash
(
logical(1)
)
Theushash
value to filter for.
Returns
Method discard()
Shrinks the BenchmarkResult by discarding parts of the internally stored data. Note that certain operations might stop work, e.g. extracting importance values from learners or calculating measures requiring the task's data.
Usage
BenchmarkResult$discard(backends = FALSE, models = FALSE)
Arguments
backends
(
logical(1)
)
IfTRUE
, the DataBackend is removed from all stored Tasks.models
(
logical(1)
)
IfTRUE
, the stored model is removed from all Learners.
Returns
Returns the object itself, but modified by reference.
You need to explicitly $clone()
the object beforehand if you want to keeps
the object in its previous state.
Method clone()
The objects of this class are cloneable with this method.
Usage
BenchmarkResult$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Note
All stored objects are accessed by reference. Do not modify any extracted object without cloning it first.
See Also
Chapter in the mlr3book: https://mlr3book.mlr-org.com/chapters/chapter3/evaluation_and_benchmarking.html#sec-benchmarking
Package mlr3viz for some generic visualizations.
-
mlr3benchmark for post-hoc analysis of benchmark results.
Other benchmark:
benchmark()
,
benchmark_grid()
Examples
set.seed(123)
learners = list(
lrn("classif.featureless", predict_type = "prob"),
lrn("classif.rpart", predict_type = "prob")
)
design = benchmark_grid(
tasks = list(tsk("sonar"), tsk("penguins")),
learners = learners,
resamplings = rsmp("cv", folds = 3)
)
print(design)
bmr = benchmark(design)
print(bmr)
bmr$tasks
bmr$learners
# first 5 resampling iterations
head(as.data.table(bmr, measures = c("classif.acc", "classif.auc")), 5)
# aggregate results
bmr$aggregate()
# aggregate results with hyperparameters as separate columns
mlr3misc::unnest(bmr$aggregate(params = TRUE), "params")
# extract resample result for classif.rpart
rr = bmr$aggregate()[learner_id == "classif.rpart", resample_result][[1]]
print(rr)
# access the confusion matrix of the first resampling iteration
rr$predictions()[[1]]$confusion
# reduce to subset with task id "sonar"
bmr$filter(task_ids = "sonar")
print(bmr)