modelSelection {autoEnsemble} | R Documentation |
Selects Diverse Top-Performing Models for Stacking an Ensemble Model
Description
Multiple model performance metrics are computed
Usage
modelSelection(
eval,
family = "binary",
top_rank = 0.01,
max = NULL,
model_selection_criteria = c("auc", "aucpr", "mcc", "f2")
)
Arguments
eval |
an object of class |
family |
model family. currently only |
top_rank |
numeric. what percentage of the top model should be selected? the default value is top 1% models. |
max |
integer. specifies maximum number of models for each criteria to be extracted. the
default value is the |
model_selection_criteria |
character, specifying the performance metrics that
should be taken into consideration for model selection. the default are
|
Value
a matrix of F-Measures for different thresholds or the highest F-Measure value
Author(s)
E. F. Haghish
Examples
## Not run:
library(h2o)
library(h2otools) #for h2o.get_ids() function
library(h2oEnsemble)
# initiate the H2O server to train a grid of models
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)
# Run a grid search or AutoML search
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30,
seed = 2023, nfolds = 10, keep_cross_validation_predictions = TRUE)
# get the model IDs from the H2O Grid search or H2O AutoML Grid
ids <- h2otools::h2o.get_ids(aml)
# evaluate all the models and return a dataframe
evals <- evaluate(id = ids)
# perform model selection (up to top 10% of each criteria)
select <- modelSelection(eval = evals, top_rank = 0.1))
## End(Not run)