R: (Adaptive) Boosting Trees (ABT/BT) fit.

BT_call {BT}

R Documentation

(Adaptive) Boosting Trees (ABT/BT) fit.

Description

Fit a (Adaptive) Boosting Trees algorithm. This is for "power" users who have a large number of variables and wish to avoid calling model.frame which can be slow in this instance. This function is in particular called by BT. It is mainly split in two parts, the first one considers the initialization (see BT_callInit) whereas the second performs all the boosting iterations (see BT_callBoosting). By default, this function does not perform input checks (those are all done in BT) and all the parameters should be given in the right format. We therefore suppose that the user is aware of all the choices made.

Usage

BT_call(
  training.set,
  validation.set,
  tweedie.power,
  respVar,
  w,
  explVar,
  ABT,
  tree.control,
  train.fraction,
  interaction.depth,
  bag.fraction,
  shrinkage,
  n.iter,
  colsample.bytree,
  keep.data,
  is.verbose
)

BT_callInit(training.set, validation.set, tweedie.power, respVar, w)

BT_callBoosting(
  training.set,
  validation.set,
  tweedie.power,
  ABT,
  tree.control,
  interaction.depth,
  bag.fraction,
  shrinkage,
  n.iter,
  colsample.bytree,
  train.fraction,
  keep.data,
  is.verbose,
  respVar,
  w,
  explVar
)

Arguments

`training.set`	a data frame containing all the related variables on which one wants to fit the algorithm.
`validation.set`	a held-out data frame containing all the related variables on which one wants to assess the algorithm performance. This can be NULL.
`tweedie.power`	Experimental parameter currently not used - Set to 1 referring to Poisson distribution.
`respVar`	the name of the target/response variable.
`w`	a vector of weights.
`explVar`	a vector containing the name of explanatory variables.
`ABT`	a boolean parameter. If `ABT=TRUE` an adaptive boosting tree algorithm is built whereas if `ABT=FALSE` an usual boosting tree algorithm is run.
`tree.control`	allows to define additional tree parameters that will be used at each iteration. See `rpart.control` for more information.
`train.fraction`	the first `train.fraction * nrows(data)` observations are used to fit the `BT` and the remainder are used for computing out-of-sample estimates (also known as validation error) of the loss function. It is mainly used to report the value in the `BTFit` object.
`interaction.depth`	the maximum depth of variable interactions: 1 builds an additive model, 2 builds a model with up to two-way interactions, etc. This parameter can also be interpreted as the maximum number of non-terminal nodes. By default, it is set to 4. Please note that if this parameter is `NULL`, all the trees in the expansion are built based on the `tree.control` parameter only. This option is devoted to advanced users only and allows them to benefit from the full flexibility of the implemented algorithm.
`bag.fraction`	the fraction of independent training observations randomly selected to propose the next tree in the expansion. This introduces randomness into the model fit. If `bag.fraction`<1 then running the same model twice will result in similar but different fits. `BT` uses the R random number generator, so `set.seed` ensures the same model can be reconstructed. Please note that if this parameter is used the `BTErrors$training.error` corresponds to the normalized in-bag error.
`shrinkage`	a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction.
`n.iter`	the total number of iterations to fit. This is equivalent to the number of trees and the number of basis functions in the additive expansion. Please note that the initialization is not taken into account in the `n.iter`. More explicitly, a weighted average initializes the algorithm and then `n.iter` trees are built. Moreover, note that the `bag.fraction`, `colsample.bytree`, ... are not used for this initializing phase.
`colsample.bytree`	each tree will be trained on a random subset of `colsample.bytree` number of features. Each tree will consider a new random subset of features from the formula, adding variability to the algorithm and reducing computation time. `colsample.bytree` will be bounded between 1 and the number of features considered.
`keep.data`	a boolean variable indicating whether to keep the data frames. This is particularly useful if one wants to keep track of the initial data frames and is further used for predicting in case any data frame is specified. Note that in case of cross-validation, if `keep.data=TRUE` the initial data frames are saved whereas the cross-validation samples are not.
`is.verbose`	if `is.verbose=TRUE`, the `BT` will print out the algorithm progress.

Value

a BTFit object.

Author(s)

Gireg Willame gireg.willame@gmail.com

This package is inspired by the gbm3 package. For more details, see https://github.com/gbm-developers/gbm3/.

References

M. Denuit, D. Hainaut and J. Trufin (2019). Effective Statistical Learning Methods for Actuaries |: GLMs and Extensions, Springer Actuarial.

M. Denuit, D. Hainaut and J. Trufin (2019). Effective Statistical Learning Methods for Actuaries ||: Tree-Based Methods and Extensions, Springer Actuarial.

M. Denuit, D. Hainaut and J. Trufin (2019). Effective Statistical Learning Methods for Actuaries |||: Neural Networks and Extensions, Springer Actuarial.

M. Denuit, D. Hainaut and J. Trufin (2022). Response versus gradient boosting trees, GLMs and neural networks under Tweedie loss and log-link. Accepted for publication in Scandinavian Actuarial Journal.

M. Denuit, J. Huyghe and J. Trufin (2022). Boosting cost-complexity pruned trees on Tweedie responses: The ABT machine for insurance ratemaking. Paper submitted for publication.

M. Denuit, J. Trufin and T. Verdebout (2022). Boosting on the responses with Tweedie loss functions. Paper submitted for publication.