ResamplingVariableSizeTrainCV {mlr3resampling} | R Documentation |
Resampling for comparing training on same or other groups
Description
ResamplingVariableSizeTrainCV
defines how a task is partitioned for
resampling, for example in
resample()
or
benchmark()
.
Resampling objects can be instantiated on a
Task
.
After instantiation, sets can be accessed via
$train_set(i)
and
$test_set(i)
, respectively.
Details
A supervised learning algorithm inputs a train set, and outputs a prediction function, which can be used on a test set. How many train samples are required to get accurate predictions on a test set? Cross-validation can be used to answer this question, with variable size train sets.
Stratification
ResamplingVariableSizeTrainCV
supports stratified sampling.
The stratification variables are assumed to be discrete,
and must be stored in the Task with column role "stratum"
.
In case of multiple stratification variables,
each combination of the values of the stratification variables forms a stratum.
Grouping
ResamplingVariableSizeTrainCV
does not support grouping of observations.
Hyper-parameters
The number of cross-validation folds should be defined as the
fold
parameter.
For each fold ID, the corresponding observations are considered the test set, and a variable number of other observations are considered the train set.
The random_seeds
parameter controls the number of random
orderings of the train set that are considered.
For each random order of the train set, the min_train_data
parameter controls the size of the smallest stratum in the smallest
train set considered.
To determine the other train set sizes, we use an equally spaced grid
on the log scale, from min_train_data
to the largest train set
size (all data not in test set). The
number of train set sizes in this grid is determined by the
train_sizes
parameter.
Methods
Public methods
Method new()
Creates a new instance of this R6 class.
Usage
Resampling$new( id, param_set = ps(), duplicated_ids = FALSE, label = NA_character_, man = NA_character_ )
Arguments
id
(
character(1)
)
Identifier for the new instance.param_set
(paradox::ParamSet)
Set of hyperparameters.duplicated_ids
(
logical(1)
)
Set toTRUE
if this resampling strategy may have duplicated row ids in a single training set or test set.label
(
character(1)
)
Label for the new instance.man
(
character(1)
)
String in the format[pkg]::[topic]
pointing to a manual page for this object. The referenced help package can be opened via method$help()
.
Method train_set()
Returns the row ids of the i-th training set.
Usage
Resampling$train_set(i)
Arguments
i
(
integer(1)
)
Iteration.
Returns
(integer()
) of row ids.
Method test_set()
Returns the row ids of the i-th test set.
Usage
Resampling$test_set(i)
Arguments
i
(
integer(1)
)
Iteration.
Returns
(integer()
) of row ids.
Examples
(var_sizes <- mlr3resampling::ResamplingVariableSizeTrainCV$new())