modelSelection {autoEnsemble}R Documentation

Selects Diverse Top-Performing Models for Stacking an Ensemble Model

Description

Multiple model performance metrics are computed

Usage

modelSelection(
  eval,
  family = "binary",
  top_rank = 0.01,
  max = NULL,
  model_selection_criteria = c("auc", "aucpr", "mcc", "f2")
)

Arguments

eval

an object of class "ensemble.eval" which is provided by 'evaluate' function. this object is a data.frame, including several performance metrics for the evaluated models.

family

model family. currently only "binary" classification models are supported.

top_rank

numeric. what percentage of the top model should be selected? the default value is top 1% models.

max

integer. specifies maximum number of models for each criteria to be extracted. the default value is the "top_rank" percentage for each model selection criteria.

model_selection_criteria

character, specifying the performance metrics that should be taken into consideration for model selection. the default are "c('auc', 'aucpr', 'mcc', 'f2')". other possible criteria are "'f1point5', 'f3', 'f4', 'f5', 'kappa', 'mean_per_class_error', 'gini', 'accuracy'", which are also provided by the "evaluate" function.

Value

a matrix of F-Measures for different thresholds or the highest F-Measure value

Author(s)

E. F. Haghish

Examples


## Not run: 
library(h2o)
library(h2otools) #for h2o.get_ids() function
library(h2oEnsemble)

# initiate the H2O server to train a grid of models
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)

# Run a grid search or AutoML search
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y])  #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30,
                  seed = 2023, nfolds = 10, keep_cross_validation_predictions = TRUE)

# get the model IDs from the H2O Grid search or H2O AutoML Grid
ids <- h2otools::h2o.get_ids(aml)

# evaluate all the models and return a dataframe
evals <- evaluate(id = ids)

# perform model selection (up to top 10% of each criteria)
select <- modelSelection(eval = evals, top_rank = 0.1))

## End(Not run)

[Package autoEnsemble version 0.2 Index]