| MLNestedCV {mlexperiments} | R Documentation |
R6 Class to perform nested cross-validation experiments
Description
The MLNestedCV class is used to construct a nested cross validation object
and to perform a nested cross validation for a specified machine learning
algorithm by performing a hyperparameter optimization with the in-sample
observations of each of the k outer folds and validate them directly on the
out-of-sample observations of the respective fold.
Details
The MLNestedCV class requires to provide a named list of predefined
row indices for the outer cross validation folds, e.g., created with the
function splitTools::create_folds(). This list also defines the k of
the k-fold cross-validation. Furthermore, a strategy needs to be chosen
("grid" or "bayesian") for the hyperparameter optimization as well as the
parameter k_tuning to define the number of inner cross validation folds.
Super classes
mlexperiments::MLBase -> mlexperiments::MLExperimentsBase -> mlexperiments::MLCrossValidation -> MLNestedCV
Public fields
strategyA character. The strategy to optimize the hyperparameters (either
"grid"or"bayesian").parameter_boundsA named list of tuples to define the parameter bounds of the Bayesian hyperparameter optimization. For further details please see the documentation of the
ParBayesianOptimizationpackage.parameter_gridA matrix with named columns in which each column represents a parameter that should be optimized and each row represents a specific hyperparameter setting that should be tested throughout the procedure. For
strategy = "grid", each row of theparameter_gridis considered as a setting that is evaluated. Forstrategy = "bayesian", theparameter_gridis passed further on to theinitGridargument of the functionParBayesianOptimization::bayesOpt()in order to initialize the Bayesian process. The maximum rows considered for initializing the Bayesian process can be specified with the R optionoption("mlexperiments.bayesian.max_init"), which is set to50Lby default.optim_argsA named list of tuples to define the parameter bounds of the Bayesian hyperparameter optimization. For further details please see the documentation of the
ParBayesianOptimizationpackage.split_typeA character. The splitting strategy to construct the k cross-validation folds. This parameter is passed further on to the function
splitTools::create_folds()and defaults to"stratified".split_vectorA vector If another criteria than the provided
yshould be considered for generating the cross-validation folds, it can be defined here. It is important, that a vector of the same length asxis provided here.k_tuningAn integer to define the number of cross-validation folds used to tune the hyperparameters.
Methods
Public methods
Inherited methods
Method new()
Create a new MLNestedCV object.
Usage
MLNestedCV$new(
learner,
strategy = c("grid", "bayesian"),
k_tuning,
fold_list,
seed,
ncores = -1L,
return_models = FALSE
)Arguments
learnerAn initialized learner object that inherits from class
"MLLearnerBase".strategyA character. The strategy to optimize the hyperparameters (either
"grid"or"bayesian").k_tuningAn integer to define the number of cross-validation folds used to tune the hyperparameters.
fold_listA named list of predefined row indices for the cross validation folds, e.g., created with the function
splitTools::create_folds().seedAn integer. Needs to be set for reproducibility purposes.
ncoresAn integer to specify the number of cores used for parallelization (default:
-1L).return_modelsA logical. If the fitted models should be returned with the results (default:
FALSE).
Details
The MLNestedCV class requires to provide a named list of predefined
row indices for the outer cross validation folds, e.g., created with
the function splitTools::create_folds(). This list also defines the
k of the k-fold cross-validation. Furthermore, a strategy needs to
be chosen ("grid" or "bayesian") for the hyperparameter optimization
as well as the parameter k_tuning to define the number of inner
cross validation folds.
Examples
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLNestedCV$new(
learner = LearnerKnn$new(),
strategy = "grid",
fold_list = fold_list,
k_tuning = 3L,
seed = 123,
ncores = 2
)
Method execute()
Execute the nested cross validation.
Usage
MLNestedCV$execute()
Details
All results of the cross validation are saved in the field $results of
the MLNestedCV class. After successful execution of the nested cross
validation, $results contains a list with the items:
"results.optimization" A list with the results of the hyperparameter optimization.
"fold" A list of folds containing the following items for each cross validation fold:
"fold_ids" A vector with the utilized in-sample row indices.
"ground_truth" A vector with the ground truth.
"predictions" A vector with the predictions.
"learner.args" A list with the arguments provided to the learner.
"model" If
return_models = TRUE, the fitted model.
"summary" A data.table with the summarized results (same as the returned value of the
executemethod)."performance" A list with the value of the performance metric calculated for each of the cross validation folds.
Returns
The function returns a data.table with the results of the nested
cross validation. More results are accessible from the field $results
of the MLNestedCV class.
Examples
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLNestedCV$new(
learner = LearnerKnn$new(),
strategy = "grid",
fold_list = fold_list,
k_tuning = 3L,
seed = 123,
ncores = 2
)
# learner args (not optimized)
cv$learner_args <- list(
l = 0,
test = parse(text = "fold_test$x")
)
# parameters for hyperparameter tuning
cv$parameter_grid <- expand.grid(
k = seq(4, 68, 8)
)
cv$split_type <- "stratified"
# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")
# set data
cv$set_data(
x = data.matrix(dataset[, -7]),
y = dataset[, 7]
)
cv$execute()
Method clone()
The objects of this class are cloneable with this method.
Usage
MLNestedCV$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Examples
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLNestedCV$new(
learner = LearnerKnn$new(),
strategy = "grid",
fold_list = fold_list,
k_tuning = 3L,
seed = 123,
ncores = 2
)
# learner args (not optimized)
cv$learner_args <- list(
l = 0,
test = parse(text = "fold_test$x")
)
# parameters for hyperparameter tuning
cv$parameter_grid <- expand.grid(
k = seq(4, 16, 8)
)
cv$split_type <- "stratified"
# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")
# set data
cv$set_data(
x = data.matrix(dataset[, -7]),
y = dataset[, 7]
)
cv$execute()
## ------------------------------------------------
## Method `MLNestedCV$new`
## ------------------------------------------------
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLNestedCV$new(
learner = LearnerKnn$new(),
strategy = "grid",
fold_list = fold_list,
k_tuning = 3L,
seed = 123,
ncores = 2
)
## ------------------------------------------------
## Method `MLNestedCV$execute`
## ------------------------------------------------
dataset <- do.call(
cbind,
c(sapply(paste0("col", 1:6), function(x) {
rnorm(n = 500)
},
USE.NAMES = TRUE,
simplify = FALSE
),
list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
y = dataset[, 7],
k = 3,
type = "stratified",
seed = 123
)
cv <- MLNestedCV$new(
learner = LearnerKnn$new(),
strategy = "grid",
fold_list = fold_list,
k_tuning = 3L,
seed = 123,
ncores = 2
)
# learner args (not optimized)
cv$learner_args <- list(
l = 0,
test = parse(text = "fold_test$x")
)
# parameters for hyperparameter tuning
cv$parameter_grid <- expand.grid(
k = seq(4, 68, 8)
)
cv$split_type <- "stratified"
# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")
# set data
cv$set_data(
x = data.matrix(dataset[, -7]),
y = dataset[, 7]
)
cv$execute()