MLNestedCV {mlexperiments}R Documentation

R6 Class to perform nested cross-validation experiments

Description

The MLNestedCV class is used to construct a nested cross validation object and to perform a nested cross validation for a specified machine learning algorithm by performing a hyperparameter optimization with the in-sample observations of each of the k outer folds and validate them directly on the out-of-sample observations of the respective fold.

Details

The MLNestedCV class requires to provide a named list of predefined row indices for the outer cross validation folds, e.g., created with the function splitTools::create_folds(). This list also defines the k of the k-fold cross-validation. Furthermore, a strategy needs to be chosen ("grid" or "bayesian") for the hyperparameter optimization as well as the parameter k_tuning to define the number of inner cross validation folds.

Super classes

mlexperiments::MLBase -> mlexperiments::MLExperimentsBase -> mlexperiments::MLCrossValidation -> MLNestedCV

Public fields

strategy

A character. The strategy to optimize the hyperparameters (either "grid" or "bayesian").

parameter_bounds

A named list of tuples to define the parameter bounds of the Bayesian hyperparameter optimization. For further details please see the documentation of the ParBayesianOptimization package.

parameter_grid

A matrix with named columns in which each column represents a parameter that should be optimized and each row represents a specific hyperparameter setting that should be tested throughout the procedure. For strategy = "grid", each row of the parameter_grid is considered as a setting that is evaluated. For strategy = "bayesian", the parameter_grid is passed further on to the initGrid argument of the function ParBayesianOptimization::bayesOpt() in order to initialize the Bayesian process. The maximum rows considered for initializing the Bayesian process can be specified with the R option option("mlexperiments.bayesian.max_init"), which is set to 50L by default.

optim_args

A named list of tuples to define the parameter bounds of the Bayesian hyperparameter optimization. For further details please see the documentation of the ParBayesianOptimization package.

split_type

A character. The splitting strategy to construct the k cross-validation folds. This parameter is passed further on to the function splitTools::create_folds() and defaults to "stratified".

split_vector

A vector If another criteria than the provided y should be considered for generating the cross-validation folds, it can be defined here. It is important, that a vector of the same length as x is provided here.

k_tuning

An integer to define the number of cross-validation folds used to tune the hyperparameters.

Methods

Public methods

Inherited methods

Method new()

Create a new MLNestedCV object.

Usage
MLNestedCV$new(
  learner,
  strategy = c("grid", "bayesian"),
  k_tuning,
  fold_list,
  seed,
  ncores = -1L,
  return_models = FALSE
)
Arguments
learner

An initialized learner object that inherits from class "MLLearnerBase".

strategy

A character. The strategy to optimize the hyperparameters (either "grid" or "bayesian").

k_tuning

An integer to define the number of cross-validation folds used to tune the hyperparameters.

fold_list

A named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds().

seed

An integer. Needs to be set for reproducibility purposes.

ncores

An integer to specify the number of cores used for parallelization (default: -1L).

return_models

A logical. If the fitted models should be returned with the results (default: FALSE).

Details

The MLNestedCV class requires to provide a named list of predefined row indices for the outer cross validation folds, e.g., created with the function splitTools::create_folds(). This list also defines the k of the k-fold cross-validation. Furthermore, a strategy needs to be chosen ("grid" or "bayesian") for the hyperparameter optimization as well as the parameter k_tuning to define the number of inner cross validation folds.

Examples
dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))

fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)

cv <- MLNestedCV$new(
  learner = LearnerKnn$new(),
  strategy = "grid",
  fold_list = fold_list,
  k_tuning = 3L,
  seed = 123,
  ncores = 2
)


Method execute()

Execute the nested cross validation.

Usage
MLNestedCV$execute()
Details

All results of the cross validation are saved in the field ⁠$results⁠ of the MLNestedCV class. After successful execution of the nested cross validation, ⁠$results⁠ contains a list with the items:

Returns

The function returns a data.table with the results of the nested cross validation. More results are accessible from the field ⁠$results⁠ of the MLNestedCV class.

Examples
dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))

fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)

cv <- MLNestedCV$new(
  learner = LearnerKnn$new(),
  strategy = "grid",
  fold_list = fold_list,
  k_tuning = 3L,
  seed = 123,
  ncores = 2
)

# learner args (not optimized)
cv$learner_args <- list(
  l = 0,
  test = parse(text = "fold_test$x")
)

# parameters for hyperparameter tuning
cv$parameter_grid <- expand.grid(
  k = seq(4, 68, 8)
)
cv$split_type <- "stratified"

# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")

# set data
cv$set_data(
  x = data.matrix(dataset[, -7]),
  y = dataset[, 7]
)

cv$execute()


Method clone()

The objects of this class are cloneable with this method.

Usage
MLNestedCV$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

splitTools::create_folds()

splitTools::create_folds()

Examples

dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))

fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)

cv <- MLNestedCV$new(
  learner = LearnerKnn$new(),
  strategy = "grid",
  fold_list = fold_list,
  k_tuning = 3L,
  seed = 123,
  ncores = 2
)

# learner args (not optimized)
cv$learner_args <- list(
  l = 0,
  test = parse(text = "fold_test$x")
)

# parameters for hyperparameter tuning
cv$parameter_grid <- expand.grid(
  k = seq(4, 16, 8)
)
cv$split_type <- "stratified"

# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")

# set data
cv$set_data(
  x = data.matrix(dataset[, -7]),
  y = dataset[, 7]
)

cv$execute()


## ------------------------------------------------
## Method `MLNestedCV$new`
## ------------------------------------------------

dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))

fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)

cv <- MLNestedCV$new(
  learner = LearnerKnn$new(),
  strategy = "grid",
  fold_list = fold_list,
  k_tuning = 3L,
  seed = 123,
  ncores = 2
)


## ------------------------------------------------
## Method `MLNestedCV$execute`
## ------------------------------------------------

dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))

fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)

cv <- MLNestedCV$new(
  learner = LearnerKnn$new(),
  strategy = "grid",
  fold_list = fold_list,
  k_tuning = 3L,
  seed = 123,
  ncores = 2
)

# learner args (not optimized)
cv$learner_args <- list(
  l = 0,
  test = parse(text = "fold_test$x")
)

# parameters for hyperparameter tuning
cv$parameter_grid <- expand.grid(
  k = seq(4, 68, 8)
)
cv$split_type <- "stratified"

# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")

# set data
cv$set_data(
  x = data.matrix(dataset[, -7]),
  y = dataset[, 7]
)

cv$execute()


[Package mlexperiments version 0.0.4 Index]