gradient_boosting_parameters {scorecardModelUtils}R Documentation

Hyperparameter optimisation or parameter tuning for Gradient Boosting Regression Modelling by grid search

Description

The function runs a grid search with k-fold cross validation to arrive at best parameter decided by some performance measure. The parameters that can be tuned using this function for gradient boosting regression modelling algorithm are - ntree, depth, shrinkage, min_obs and bag_fraction. The objective function to be minimised is the error (mean absolute error / mean squared error / root mean squared error). For the grid search, the possible values of each tuning parameter needs to be passed as an array into the function.

Usage

gradient_boosting_parameters(base, target, ntree, depth, shrinkage, min_obs,
  bag_fraction, error = "rmse", cv = 1)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

ntree

number of trees to be fitted

depth

maximum depth of variable interactions

shrinkage

learning rate

min_obs

minimum size of terminal nodes

bag_fraction

fraction of the training set observations randomly selected for next tree

error

(optional) error measure as objective function to be minimised, to be chosen among "mae", "mse" and "rmse" (default value is "rmse")

cv

(optional) k vakue for k-fold cross validation to be performed (default value is 1 ie. without cross validation)

Value

An object of class "gradient_boosting_parameters" is a list containing the following components:

error_tab_detailed

error summary for each cross validation sample of the parameter combinations iterated during grid search as a dataframe

error_tab_summary

error summary for each combination of parameters as a dataframe

best_ntree

ntree parameter of the optimal solution

best_depth

depth parameter of the optimal solution

best_shrinkage

shrinkage parameter of the optimal solution

best_min_obs

cost min_obs of the optimal solution

best_bag_fraction

bag_fraction parameter of the optimal solution

runtime

runtime of the entire process

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Examples

data <- iris
suppressWarnings(RNGversion('3.5.0'))
set.seed(11)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
gbm_params_list <- gradient_boosting_parameters(base = data,target = "Y",ntree = 2,depth = 2,
                   shrinkage = 0.1,min_obs = 0.1,bag_fraction = 0.7)
gbm_params_list$error_tab_detailed
gbm_params_list$error_tab_summary
gbm_params_list$best_ntree
gbm_params_list$best_depth
gbm_params_list$best_shrinkage
gbm_params_list$best_min_obs
gbm_params_list$best_bag_fraction
gbm_params_list$runtime

[Package scorecardModelUtils version 0.0.1.0 Index]