BT_call {BT}R Documentation

(Adaptive) Boosting Trees (ABT/BT) fit.

Description

Fit a (Adaptive) Boosting Trees algorithm. This is for "power" users who have a large number of variables and wish to avoid calling model.frame which can be slow in this instance. This function is in particular called by BT. It is mainly split in two parts, the first one considers the initialization (see BT_callInit) whereas the second performs all the boosting iterations (see BT_callBoosting). By default, this function does not perform input checks (those are all done in BT) and all the parameters should be given in the right format. We therefore suppose that the user is aware of all the choices made.

Usage

BT_call(
  training.set,
  validation.set,
  tweedie.power,
  respVar,
  w,
  explVar,
  ABT,
  tree.control,
  train.fraction,
  interaction.depth,
  bag.fraction,
  shrinkage,
  n.iter,
  colsample.bytree,
  keep.data,
  is.verbose
)

BT_callInit(training.set, validation.set, tweedie.power, respVar, w)

BT_callBoosting(
  training.set,
  validation.set,
  tweedie.power,
  ABT,
  tree.control,
  interaction.depth,
  bag.fraction,
  shrinkage,
  n.iter,
  colsample.bytree,
  train.fraction,
  keep.data,
  is.verbose,
  respVar,
  w,
  explVar
)

Arguments

training.set

a data frame containing all the related variables on which one wants to fit the algorithm.

validation.set

a held-out data frame containing all the related variables on which one wants to assess the algorithm performance. This can be NULL.

tweedie.power

Experimental parameter currently not used - Set to 1 referring to Poisson distribution.

respVar

the name of the target/response variable.

w

a vector of weights.

explVar

a vector containing the name of explanatory variables.

ABT

a boolean parameter. If ABT=TRUE an adaptive boosting tree algorithm is built whereas if ABT=FALSE an usual boosting tree algorithm is run.

tree.control

allows to define additional tree parameters that will be used at each iteration. See rpart.control for more information.

train.fraction

the first train.fraction * nrows(data) observations are used to fit the BT and the remainder are used for computing out-of-sample estimates (also known as validation error) of the loss function. It is mainly used to report the value in the BTFit object.

interaction.depth

the maximum depth of variable interactions: 1 builds an additive model, 2 builds a model with up to two-way interactions, etc. This parameter can also be interpreted as the maximum number of non-terminal nodes. By default, it is set to 4. Please note that if this parameter is NULL, all the trees in the expansion are built based on the tree.control parameter only. This option is devoted to advanced users only and allows them to benefit from the full flexibility of the implemented algorithm.

bag.fraction

the fraction of independent training observations randomly selected to propose the next tree in the expansion. This introduces randomness into the model fit. If bag.fraction<1 then running the same model twice will result in similar but different fits. BT uses the R random number generator, so set.seed ensures the same model can be reconstructed. Please note that if this parameter is used the BTErrors$training.error corresponds to the normalized in-bag error.

shrinkage

a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction.

n.iter

the total number of iterations to fit. This is equivalent to the number of trees and the number of basis functions in the additive expansion. Please note that the initialization is not taken into account in the n.iter. More explicitly, a weighted average initializes the algorithm and then n.iter trees are built. Moreover, note that the bag.fraction, colsample.bytree, ... are not used for this initializing phase.

colsample.bytree

each tree will be trained on a random subset of colsample.bytree number of features. Each tree will consider a new random subset of features from the formula, adding variability to the algorithm and reducing computation time. colsample.bytree will be bounded between 1 and the number of features considered.

keep.data

a boolean variable indicating whether to keep the data frames. This is particularly useful if one wants to keep track of the initial data frames and is further used for predicting in case any data frame is specified. Note that in case of cross-validation, if keep.data=TRUE the initial data frames are saved whereas the cross-validation samples are not.

is.verbose

if is.verbose=TRUE, the BT will print out the algorithm progress.

Value

a BTFit object.

Author(s)

Gireg Willame gireg.willame@gmail.com

This package is inspired by the gbm3 package. For more details, see https://github.com/gbm-developers/gbm3/.

References

M. Denuit, D. Hainaut and J. Trufin (2019). Effective Statistical Learning Methods for Actuaries |: GLMs and Extensions, Springer Actuarial.

M. Denuit, D. Hainaut and J. Trufin (2019). Effective Statistical Learning Methods for Actuaries ||: Tree-Based Methods and Extensions, Springer Actuarial.

M. Denuit, D. Hainaut and J. Trufin (2019). Effective Statistical Learning Methods for Actuaries |||: Neural Networks and Extensions, Springer Actuarial.

M. Denuit, D. Hainaut and J. Trufin (2022). Response versus gradient boosting trees, GLMs and neural networks under Tweedie loss and log-link. Accepted for publication in Scandinavian Actuarial Journal.

M. Denuit, J. Huyghe and J. Trufin (2022). Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Paper submitted for publication.

M. Denuit, J. Trufin and T. Verdebout (2022). Boosting on the responses with Tweedie loss functions. Paper submitted for publication.

See Also

BTFit, BTCVFit, BT_perf, predict.BTFit, summary.BTFit, print.BTFit, .BT_cv_errors.


[Package BT version 0.4 Index]