gpboost {gpboost} | R Documentation |
Train a GPBoost model
Description
Simple interface for training a GPBoost model.
Usage
gpboost(data, label = NULL, weight = NULL, params = list(),
nrounds = 100L, gp_model = NULL, line_search_step_length = FALSE,
use_gp_model_for_validation = TRUE, train_gp_model_cov_pars = TRUE,
valids = list(), obj = NULL, eval = NULL, verbose = 1L,
record = TRUE, eval_freq = 1L, early_stopping_rounds = NULL,
init_model = NULL, colnames = NULL, categorical_feature = NULL,
callbacks = list(), ...)
Arguments
data |
a |
label |
Vector of response values / labels, used if |
weight |
Vector of weights. The GPBoost algorithm currently does not support weights |
params |
list of "tuning" parameters. See the parameter documentation for more information. A few key parameters:
|
nrounds |
number of boosting iterations (= number of trees). This is the most important tuning parameter for boosting |
gp_model |
A |
line_search_step_length |
Boolean. If TRUE, a line search is done to find the optimal step length for every boosting update
(see, e.g., Friedman 2001). This is then multiplied by the |
use_gp_model_for_validation |
Boolean. If TRUE, the |
train_gp_model_cov_pars |
Boolean. If TRUE, the covariance parameters
of the |
valids |
a list of |
obj |
(character) The distribution of the response variable (=label) conditional on fixed and random effects. This only needs to be set when doing independent boosting without random effects / Gaussian processes. |
eval |
Evaluation metric to be monitored when doing CV and parameter tuning. This can be a string, function, or list with a mixture of strings and functions.
|
verbose |
verbosity for output, if <= 0, also will disable the print of evaluation during training |
record |
Boolean, TRUE will record iteration message to |
eval_freq |
evaluation output frequency, only effect when verbose > 0 |
early_stopping_rounds |
int. Activates early stopping. Requires at least one validation data
and one metric. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for |
init_model |
path of model file of |
colnames |
feature names, if not null, will use this to overwrite the names in dataset |
categorical_feature |
categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
|
callbacks |
List of callback functions that are applied at each iteration. |
... |
Additional arguments passed to
|
Value
a trained gpb.Booster
Early Stopping
"early stopping" refers to stopping the training process if the model's performance on a given validation set does not improve for several consecutive iterations.
If multiple arguments are given to eval
, their order will be preserved. If you enable
early stopping by setting early_stopping_rounds
in params
, by default all
metrics will be considered for early stopping.
If you want to only consider the first metric for early stopping, pass
first_metric_only = TRUE
in params
. Note that if you also specify metric
in params
, that metric will be considered the "first" one. If you omit metric
,
a default metric will be used based on your choice for the parameter obj
(keyword argument)
or objective
(passed into params
).
Author(s)
Fabio Sigrist, authors of the LightGBM R package
Examples
# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples
library(gpboost)
data(GPBoost_data, package = "gpboost")
#--------------------Combine tree-boosting and grouped random effects model----------------
# Create random effects model
gp_model <- GPModel(group_data = group_data[,1], likelihood = "gaussian")
# The default optimizer for covariance parameters (hyperparameters) is
# Nesterov-accelerated gradient descent.
# This can be changed to, e.g., Nelder-Mead as follows:
# re_params <- list(optimizer_cov = "nelder_mead")
# gp_model$set_optim_params(params=re_params)
# Use trace = TRUE to monitor convergence:
# re_params <- list(trace = TRUE)
# gp_model$set_optim_params(params=re_params)
# Train model
bst <- gpboost(data = X, label = y, gp_model = gp_model, nrounds = 16,
learning_rate = 0.05, max_depth = 6, min_data_in_leaf = 5,
verbose = 0)
# Estimated random effects model
summary(gp_model)
# Make predictions
# Predict latent variables
pred <- predict(bst, data = X_test, group_data_pred = group_data_test[,1],
predict_var = TRUE, pred_latent = TRUE)
pred$random_effect_mean # Predicted latent random effects mean
pred$random_effect_cov # Predicted random effects variances
pred$fixed_effect # Predicted fixed effects from tree ensemble
# Predict response variable
pred_resp <- predict(bst, data = X_test, group_data_pred = group_data_test[,1],
predict_var = TRUE, pred_latent = FALSE)
pred_resp$response_mean # Predicted response mean
# For Gaussian data: pred$random_effect_mean + pred$fixed_effect = pred_resp$response_mean
pred$random_effect_mean + pred$fixed_effect - pred_resp$response_mean
#--------------------Combine tree-boosting and Gaussian process model----------------
# Create Gaussian process model
gp_model <- GPModel(gp_coords = coords, cov_function = "exponential",
likelihood = "gaussian")
# Train model
bst <- gpboost(data = X, label = y, gp_model = gp_model, nrounds = 8,
learning_rate = 0.1, max_depth = 6, min_data_in_leaf = 5,
verbose = 0)
# Estimated random effects model
summary(gp_model)
# Make predictions
pred <- predict(bst, data = X_test, gp_coords_pred = coords_test,
predict_var = TRUE, pred_latent = TRUE)
pred$random_effect_mean # Predicted latent random effects mean
pred$random_effect_cov # Predicted random effects variances
pred$fixed_effect # Predicted fixed effects from tree ensemble
# Predict response variable
pred_resp <- predict(bst, data = X_test, gp_coords_pred = coords_test,
predict_var = TRUE, pred_latent = FALSE)
pred_resp$response_mean # Predicted response mean