h2o_train {agua}  R Documentation 
Model wrappers for h2o
Description
Basic model wrappers for h2o model functions that include data conversion, seed configuration, and so on.
Usage
h2o_train(
x,
y,
model,
weights = NULL,
validation = NULL,
save_data = FALSE,
...
)
h2o_train_rf(x, y, ntrees = 50, mtries = 1, min_rows = 1, ...)
h2o_train_xgboost(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
validation = NULL,
...
)
h2o_train_gbm(
x,
y,
ntrees = 50,
max_depth = 6,
min_rows = 1,
learn_rate = 0.3,
sample_rate = 1,
col_sample_rate = 1,
min_split_improvement = 0,
stopping_rounds = 0,
...
)
h2o_train_glm(x, y, lambda = NULL, alpha = NULL, ...)
h2o_train_nb(x, y, laplace = 0, ...)
h2o_train_mlp(
x,
y,
hidden = 200,
l2 = 0,
hidden_dropout_ratios = 0,
epochs = 10,
activation = "Rectifier",
validation = NULL,
...
)
h2o_train_rule(
x,
y,
rule_generation_ntrees = 50,
max_rule_length = 5,
lambda = NULL,
...
)
h2o_train_auto(x, y, verbosity = NULL, save_data = FALSE, ...)
Arguments
x 
A data frame of predictors. 
y 
A vector of outcomes. 
model 
A character string for the model. Current selections are

weights 
A numeric vector of case weights. 
validation 
An integer between 0 and 1 specifying the proportion of the data reserved as validation set. This is used by h2o for performance assessment and potential early stopping. Default to 0. 
save_data 
A logical for whether training data should be saved on
the h2o server, set this to 
... 
Other options to pass to the h2o model functions (e.g.,

ntrees 
Number of trees. Defaults to 50. 
mtries 
Number of variables randomly sampled as candidates at each split. If set to 1, defaults to sqrtp for classification and p/3 for regression (where p is the # of predictors Defaults to 1. 
min_rows 
Fewest allowed (weighted) observations in a leaf. Defaults to 1. 
max_depth 
Maximum tree depth (0 for unlimited). Defaults to 20. 
learn_rate 
(same as eta) Learning rate (from 0.0 to 1.0) Defaults to 0.3. 
sample_rate 
Row sample rate per tree (from 0.0 to 1.0) Defaults to 0.632. 
col_sample_rate 
(same as colsample_bylevel) Column sample rate (from 0.0 to 1.0) Defaults to 1. 
min_split_improvement 
Minimum relative improvement in squared error reduction for a split to happen Defaults to 1e05. 
stopping_rounds 
Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events (0 to disable) Defaults to 0. 
lambda 
Regularization strength 
alpha 
Distribution of regularization between the L1 (Lasso) and L2 (Ridge) penalties. A value of 1 for alpha represents Lasso regression, a value of 0 produces Ridge regression, and anything in between specifies the amount of mixing between the two. Default value of alpha is 0 when SOLVER = 'LBFGS'; 0.5 otherwise. 
laplace 
Laplace smoothing parameter Defaults to 0. 
Hidden layer sizes (e.g. [100, 100]). Defaults to c(200, 200).  
l2 
L2 regularization (can add stability and improve generalization, causes many weights to be small. Defaults to 0. 
Hidden layer dropout ratios (can improve generalization), specify one value per hidden layer, defaults to 0.5.  
epochs 
How many times the dataset should be iterated (streamed), can be fractional. Defaults to 10. 
activation 
Activation function. Must be one of: "Tanh", "TanhWithDropout", "Rectifier", "RectifierWithDropout", "Maxout", "MaxoutWithDropout". Defaults to Rectifier. 
rule_generation_ntrees 
Specifies the number of trees to build in the tree model. Defaults to 50. Defaults to 50. 
max_rule_length 
Maximum length of rules. Defaults to 3. 
verbosity 
Verbosity of the backend messages printed during training; Must be one of NULL (live log disabled), "debug", "info", "warn", "error". Defaults to NULL. 
Value
An h2o model object.
Examples
# start with h2o::h2o.init()
if (h2o_running()) {
# 
# Using the model wrappers:
h2o_train_glm(mtcars[, 1], mtcars$mpg)
# 
# using parsnip:
spec <
rand_forest(mtry = 3, trees = 500) %>%
set_engine("h2o") %>%
set_mode("regression")
set.seed(1)
mod < fit(spec, mpg ~ ., data = mtcars)
mod
predict(mod, head(mtcars))
}