MLCrossValidation {mlexperiments}R Documentation

R6 Class to perform cross-validation experiments

Description

The MLCrossValidation class is used to construct a cross validation object and to perform a k-fold cross validation for a specified machine learning algorithm using one distinct hyperparameter setting.

Details

The MLCrossValidation class requires to provide a named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds(). This list also defines the k of the k-fold cross-validation. When wanting to perform a repeated k-fold cross validations, just provide a list with all repeated fold definitions, e.g., when specifying the argument m_rep of splitTools::create_folds().

Super classes

mlexperiments::MLBase -> mlexperiments::MLExperimentsBase -> MLCrossValidation

Public fields

fold_list

A named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds().

return_models

A logical. If the fitted models should be returned with the results (default: FALSE).

performance_metric

Either a named list with metric functions, a single metric function, or a character vector with metric names from the mlr3measures package. The provided functions must take two named arguments: ground_truth and predictions. For metrics from the mlr3measures package, the wrapper function metric() exists in order to prepare them for use with the mlexperiments package.

performance_metric_args

A list. Further arguments required to compute the performance metric.

predict_args

A list. Further arguments required to compute the predictions.

Methods

Public methods

Inherited methods

Method new()

Create a new MLCrossValidation object.

Usage
MLCrossValidation$new(
  learner,
  fold_list,
  seed,
  ncores = -1L,
  return_models = FALSE
)
Arguments
learner

An initialized learner object that inherits from class "MLLearnerBase".

fold_list

A named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds().

seed

An integer. Needs to be set for reproducibility purposes.

ncores

An integer to specify the number of cores used for parallelization (default: -1L).

return_models

A logical. If the fitted models should be returned with the results (default: FALSE).

Details

The MLCrossValidation class requires to provide a named list of predefined row indices for the cross validation folds, e.g., created with the function splitTools::create_folds(). This list also defines the k of the k-fold cross-validation. When wanting to perform a repeated k-fold cross validations, just provide a list with all repeated fold definitions, e.g., when specifing the argument m_rep of splitTools::create_folds().

Examples
dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)
cv <- MLCrossValidation$new(
  learner = LearnerKnn$new(),
  fold_list = fold_list,
  seed = 123,
  ncores = 2
)


Method execute()

Execute the cross validation.

Usage
MLCrossValidation$execute()
Details

All results of the cross validation are saved in the field ⁠$results⁠ of the MLCrossValidation class. After successful execution of the cross validation, ⁠$results⁠ contains a list with the items:

Returns

The function returns a data.table with the results of the cross validation. More results are accessible from the field ⁠$results⁠ of the MLCrossValidation class.

Examples
dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)
cv <- MLCrossValidation$new(
  learner = LearnerKnn$new(),
  fold_list = fold_list,
  seed = 123,
  ncores = 2
)
cv$learner_args <- list(
  k = 20,
  l = 0,
  test = parse(text = "fold_test$x")
)
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")

# set data
cv$set_data(
  x = data.matrix(dataset[, -7]),
  y = dataset[, 7]
)

cv$execute()

Method clone()

The objects of this class are cloneable with this method.

Usage
MLCrossValidation$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

splitTools::create_folds()

splitTools::create_folds(), mlr3measures::measures, metric()

Examples

dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))

fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)

cv <- MLCrossValidation$new(
  learner = LearnerKnn$new(),
  fold_list = fold_list,
  seed = 123,
  ncores = 2
)

# learner parameters
cv$learner_args <- list(
  k = 20,
  l = 0,
  test = parse(text = "fold_test$x")
)

# performance parameters
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")

# set data
cv$set_data(
  x = data.matrix(dataset[, -7]),
  y = dataset[, 7]
)

cv$execute()


## ------------------------------------------------
## Method `MLCrossValidation$new`
## ------------------------------------------------

dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)
cv <- MLCrossValidation$new(
  learner = LearnerKnn$new(),
  fold_list = fold_list,
  seed = 123,
  ncores = 2
)


## ------------------------------------------------
## Method `MLCrossValidation$execute`
## ------------------------------------------------

dataset <- do.call(
  cbind,
  c(sapply(paste0("col", 1:6), function(x) {
    rnorm(n = 500)
    },
    USE.NAMES = TRUE,
    simplify = FALSE
   ),
   list(target = sample(0:1, 500, TRUE))
))
fold_list <- splitTools::create_folds(
  y = dataset[, 7],
  k = 3,
  type = "stratified",
  seed = 123
)
cv <- MLCrossValidation$new(
  learner = LearnerKnn$new(),
  fold_list = fold_list,
  seed = 123,
  ncores = 2
)
cv$learner_args <- list(
  k = 20,
  l = 0,
  test = parse(text = "fold_test$x")
)
cv$predict_args <- list(type = "response")
cv$performance_metric <- metric("bacc")

# set data
cv$set_data(
  x = data.matrix(dataset[, -7]),
  y = dataset[, 7]
)

cv$execute()

[Package mlexperiments version 0.0.3 Index]