h2o.grid {h2o} | R Documentation |
H2O Grid Support
Description
Provides a set of functions to launch a grid search and get its results.
Usage
h2o.grid(
algorithm,
grid_id,
x,
y,
training_frame,
...,
hyper_params = list(),
is_supervised = NULL,
do_hyper_params_check = FALSE,
search_criteria = NULL,
export_checkpoints_dir = NULL,
recovery_dir = NULL,
parallelism = 1
)
Arguments
algorithm |
Name of algorithm to use in grid search (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, pca). |
grid_id |
(Optional) ID for resulting grid search. If it is not specified then it is autogenerated. |
x |
(Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used. |
y |
The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model. |
training_frame |
Id of the training data frame. |
... |
arguments describing parameters to use with algorithm (i.e., x, y, training_frame). Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters. |
hyper_params |
List of lists of hyper parameters (i.e., |
is_supervised |
[Deprecated] It is not possible to override default behaviour. (Optional) If specified then override the default heuristic which decides if the given algorithm name and parameters specify a supervised or unsupervised algorithm. |
do_hyper_params_check |
Perform client check for specified hyper parameters. It can be time expensive for large hyper space. |
search_criteria |
(Optional) List of control parameters for smarter hyperparameter search. The list can
include values for: strategy, max_models, max_runtime_secs, stopping_metric, stopping_tolerance, stopping_rounds and
seed. The default strategy 'Cartesian' covers the entire space of hyperparameter combinations. If you want to use
cartesian grid search, you can leave the search_criteria argument unspecified. Specify the "RandomDiscrete" strategy
to get random search of all the combinations of your hyperparameters with three ways of specifying when to stop the
search: max number of models, max time, and metric-based early stopping (e.g., stop if MSE has not improved by 0.0001
over the 5 best models). Examples below:
|
export_checkpoints_dir |
Directory to automatically export grid and its models to. |
recovery_dir |
When specified the grid and all necessary data (frames, models) will be saved to this
directory (use HDFS or other distributed file-system). Should the cluster crash during training, the grid
can be reloaded from this directory via |
parallelism |
Level of Parallelism during grid model building. 1 = sequential building (default). Use the value of 0 for adaptive parallelism - decided by H2O. Any number > 1 sets the exact number of models built in parallel. |
Details
Launch grid search with given algorithm and parameters.
Examples
## Not run:
library(h2o)
library(jsonlite)
h2o.init()
iris_hf <- as.h2o(iris)
grid <- h2o.grid("gbm", x = c(1:4), y = 5, training_frame = iris_hf,
hyper_params = list(ntrees = c(1, 2, 3)))
# Get grid summary
summary(grid)
# Fetch grid models
model_ids <- grid@model_ids
models <- lapply(model_ids, function(id) { h2o.getModel(id)})
## End(Not run)