mboost_fit {mboost} | R Documentation |
Model-based Gradient Boosting
Description
Work-horse for gradient boosting for optimizing arbitrary loss functions, where component-wise models are utilized as base-learners. Usually, this function is not called directly by the user.
Usage
mboost_fit(blg, response, weights = rep(1, NROW(response)), offset = NULL,
family = Gaussian(), control = boost_control(), oobweights =
as.numeric(weights == 0))
Arguments
blg |
a list of objects of elements of class |
response |
the response variable. |
weights |
(optional) a numeric vector of weights to be used in the fitting process. |
offset |
a numeric vector to be used as offset (optional). |
family |
a |
control |
a list of parameters controlling the algorithm. For
more details see |
oobweights |
an additional vector of out-of-bag weights, which is
used for the out-of-bag risk (i.e., if |
Details
The function implements component-wise functional gradient boosting in
a generic way. This function is the main work horse and used as back-end by
all boosting algorithms in a unified way. Usually, this function is not
called directly. Note that the more convenient modelling interfaces
gamboost
, glmboost
and blackboost
all call mboost_fit
.
Basically, the algorithm is initialized with a function
for computing the negative gradient of the loss function (via its
family
argument) and one or more base-learners (given as
blg
). Usually blg
and response
are computed in
the functions gamboost
, glmboost
,
blackboost
or mboost
. See there for details
on the specification of base-learners.
The algorithm minimized the in-sample empirical risk defined as
the weighted sum (by weights
) of the loss function (corresponding
to the negative gradient) evaluated at the data.
The structure of the model is determined by the structure of the base-learners. If more than one base-learner is given, the model is additive in these components.
Base-learners can be specified via a formula interface
(function mboost
) or as a list of objects of class bl
,
see, e.g., bols
.
oobweights
is a vector used internally by cvrisk
. When carrying
out cross-validation to determine the optimal stopping iteration of a boosting
model, the default value of oobweights
(out-of-bag weights) assures
that the cross-validated risk is computed using the same observation weights
as those used for fitting the boosting model. It is strongly recommended to
leave this argument unspecified.
Value
An object of class mboost
with print
,
AIC
, plot
and predict
methods being available.
References
Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324–339.
Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109–2113.
Yoav Freund and Robert E. Schapire (1996), Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth International Conference, 148–156.
Jerome H. Friedman (2001), Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.
Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid
(2014). Model-based Boosting in R: A Hands-on Tutorial Using the R
Package mboost. Computational Statistics, 29, 3–35.
doi:10.1007/s00180-012-0382-5
Available as vignette via: vignette(package = "mboost", "mboost_tutorial")
See Also
glmboost
for boosted linear models and
blackboost
for boosted trees. See e.g. bbs
for possible base-learners. See cvrisk
for
cross-validated stopping iteration. Furthermore see
boost_control
, Family
and
methods
.
Examples
data("bodyfat", package = "TH.data")
### formula interface: additive Gaussian model with
### a non-linear step-function in `age', a linear function in `waistcirc'
### and a smooth non-linear smooth function in `hipcirc'
mod <- mboost(DEXfat ~ btree(age) + bols(waistcirc) + bbs(hipcirc),
data = bodyfat)
layout(matrix(1:6, nc = 3, byrow = TRUE))
plot(mod, main = "formula")
### the same
with(bodyfat,
mod <- mboost_fit(list(btree(age), bols(waistcirc), bbs(hipcirc)),
response = DEXfat))
plot(mod, main = "base-learner")