R: Gradient Boosting with Regression Trees

blackboost {mboost}

R Documentation

Gradient Boosting with Regression Trees

Description

Gradient boosting for optimizing arbitrary loss functions where regression trees are utilized as base-learners.

Usage

blackboost(formula, data = list(),
           weights = NULL, na.action = na.pass,
           offset = NULL, family = Gaussian(), 
           control = boost_control(),
           oobweights = NULL,
           tree_controls = partykit::ctree_control(
               teststat = "quad",
               testtype = "Teststatistic",
               mincriterion = 0,
               minsplit = 10, 
               minbucket = 4,
               maxdepth = 2, 
               saveinfo = FALSE),
           ...)

Arguments

`formula`	a symbolic description of the model to be fit.
`data`	a data frame containing the variables in the model.
`weights`	an optional vector of weights to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain `NA`s.
`offset`	a numeric vector to be used as offset (optional).
`family`	a `Family` object.
`control`	a list of parameters controlling the algorithm. For more details see `boost_control`.
`oobweights`	an additional vector of out-of-bag weights, which is used for the out-of-bag risk (i.e., if `boost_control(risk = "oobag")`). This argument is also used internally by `cvrisk`.
`tree_controls`	an object of class `"TreeControl"`, which can be obtained using `ctree_control`. Defines hyper-parameters for the trees which are used as base-learners. It is wise to make sure to understand the consequences of altering any of its arguments. By default, two-way interactions (but not deeper trees) are fitted.
`...`	additional arguments passed to `mboost_fit`, including `weights`, `offset`, `family` and `control`. For default values see `mboost_fit`.

Details

This function implements the ‘classical’ gradient boosting utilizing regression trees as base-learners. Essentially, the same algorithm is implemented in package gbm. The main difference is that arbitrary loss functions to be optimized can be specified via the family argument to blackboost whereas gbm uses hard-coded loss functions. Moreover, the base-learners (conditional inference trees, see ctree) are a little bit more flexible.

The regression fit is a black box prediction machine and thus hardly interpretable.

Partial dependency plots are not yet available; see example section for plotting of additive tree models.

Value

An object of class mboost with print and predict methods being available.

References

Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.

Torsten Hothorn, Kurt Hornik and Achim Zeileis (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

Yoav Freund and Robert E. Schapire (1996), Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth International Conference, 148–156.

Jerome H. Friedman (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.

Greg Ridgeway (1999), The state of boosting. Computing Science and Statistics, 31, 172–181.

Examples


### a simple two-dimensional example: cars data
cars.gb <- blackboost(dist ~ speed, data = cars,
                      control = boost_control(mstop = 50))
cars.gb

### plot fit
plot(dist ~ speed, data = cars)
lines(cars$speed, predict(cars.gb), col = "red")

### set up and plot additive tree model
if (require("partykit")) {
    ctrl <- ctree_control(maxdepth = 3)
    viris <- subset(iris, Species != "setosa")
    viris$Species <- viris$Species[, drop = TRUE]
    imod <- mboost(Species ~ btree(Sepal.Length, tree_controls = ctrl) +
                             btree(Sepal.Width, tree_controls = ctrl) +
                             btree(Petal.Length, tree_controls = ctrl) +
                             btree(Petal.Width, tree_controls = ctrl),
                   data = viris, family = Binomial())[500]
    layout(matrix(1:4, ncol = 2))
    plot(imod)
}

[Package mboost version 2.9-10 Index]