R: Selects Diverse Top-Performing Models for Stacking an...

modelSelection {autoEnsemble}

R Documentation

Selects Diverse Top-Performing Models for Stacking an Ensemble Model

Description

Multiple model performance metrics are computed

Usage

modelSelection(
  eval,
  family = "binary",
  top_rank = 0.01,
  max = NULL,
  model_selection_criteria = c("auc", "aucpr", "mcc", "f2")
)

Arguments

`eval`	an object of class `"ensemble.eval"` which is provided by 'evaluate' function. this object is a data.frame, including several performance metrics for the evaluated models.
`family`	model family. currently only `"binary"` classification models are supported.
`top_rank`	numeric. what percentage of the top model should be selected? the default value is top 1% models.
`max`	integer. specifies maximum number of models for each criteria to be extracted. the default value is the `"top_rank"` percentage for each model selection criteria.
`model_selection_criteria`	character, specifying the performance metrics that should be taken into consideration for model selection. the default are `"c('auc', 'aucpr', 'mcc', 'f2')"`. other possible criteria are `"'f1point5', 'f3', 'f4', 'f5', 'kappa', 'mean_per_class_error', 'gini', 'accuracy'"`, which are also provided by the `"evaluate"` function.

Value

a matrix of F-Measures for different thresholds or the highest F-Measure value

Author(s)

E. F. Haghish

Examples


## Not run: 
library(h2o)
library(h2otools) #for h2o.get_ids() function
library(h2oEnsemble)

# initiate the H2O server to train a grid of models
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)

# Run a grid search or AutoML search
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y])  #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30,
                  seed = 2023, nfolds = 10, keep_cross_validation_predictions = TRUE)

# get the model IDs from the H2O Grid search or H2O AutoML Grid
ids <- h2otools::h2o.get_ids(aml)

# evaluate all the models and return a dataframe
evals <- evaluate(id = ids)

# perform model selection (up to top 10% of each criteria)
select <- modelSelection(eval = evals, top_rank = 0.1))

## End(Not run)

[Package autoEnsemble version 0.2 Index]