random_forest_parameters {scorecardModelUtils}R Documentation

Hyperparameter optimisation or parameter tuning for Random Forest by grid search

Description

The function runs a grid search with k-fold cross validation to arrive at best parameter decided by some performance measure. The parameters that can be tuned using this function for random forest algorithm are - ntree, mtry, maxnodes and nodesize. The objective function to be minimised is the error (mean absolute error / mean squared error / root mean squared error). For the grid search, the possible values of each tuning parameter needs to be passed as an array into the function.

Usage

random_forest_parameters(base, target, model_type, ntree, mtry,
  maxnodes = NULL, nodesize, error = "rmse", cv = 1)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

model_type

to be chosen among "regression" or "classification"

ntree

number of trees to be fitted

mtry

number of variable to be sampled as split criteria at each node

maxnodes

(optional) Maximum number of terminal nodes (default is NULL ie. no restriction on depth of the trees)

nodesize

minimum size of terminal nodes

error

(optional) error measure as objective function to be minimised, to be chosen among "mae", "mse" and "rmse" (default value is "rmse")

cv

(optional) k vakue for k-fold cross validation to be performed (default value is 1 ie. without cross validation)

Value

An object of class "random_forest_parameters" is a list containing the following components:

error_tab_detailed

error summary for each cross validation sample of the parameter combinations iterated during grid search as a dataframe

error_tab_summary

error summary for each combination of parameters as a dataframe

best_ntree

ntree parameter of the optimal solution

best_mtry

mtry parameter of the optimal solution

maxnodes

maxnodes parameter of the optimal solution

best_nodesize

nodesize parameter of the optimal solution

runtime

runtime of the entire process

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Aiana Goyal <aianagoel002@gmail.com>

Examples

data <- iris
suppressWarnings(RNGversion('3.5.0'))
set.seed(11)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
rf_params_list <- random_forest_parameters(base = data,target = "Y",
                  model_type = "classification",ntree = 2,mtry = 1,nodesize = 3)
rf_params_list$error_tab_detailed
rf_params_list$error_tab_summary
rf_params_list$best_ntree
rf_params_list$best_mtry
rf_params_list$maxnodes
rf_params_list$best_nodesize
rf_params_list$runtime

[Package scorecardModelUtils version 0.0.1.0 Index]