mboost {mboost} | R Documentation |
Gradient Boosting for Additive Models
Description
Gradient boosting for optimizing arbitrary loss functions, where component-wise arbitrary base-learners, e.g., smoothing procedures, are utilized as additive base-learners.
Usage
mboost(formula, data = list(), na.action = na.omit, weights = NULL,
offset = NULL, family = Gaussian(), control = boost_control(),
oobweights = NULL, baselearner = c("bbs", "bols", "btree", "bss", "bns"),
...)
gamboost(formula, data = list(), na.action = na.omit, weights = NULL,
offset = NULL, family = Gaussian(), control = boost_control(),
oobweights = NULL, baselearner = c("bbs", "bols", "btree", "bss", "bns"),
dfbase = 4, ...)
Arguments
formula |
a symbolic description of the model to be fit. |
data |
a data frame containing the variables in the model. |
na.action |
a function which indicates what should happen when
the data contain |
weights |
(optional) a numeric vector of weights to be used in the fitting process. |
offset |
a numeric vector to be used as offset (optional). |
family |
a |
control |
a list of parameters controlling the algorithm. For
more details see |
oobweights |
an additional vector of out-of-bag weights, which is
used for the out-of-bag risk (i.e., if |
baselearner |
a character specifying the component-wise base
learner to be used: |
dfbase |
a single integer giving the degrees of freedom for P-spline
base-learners ( |
... |
additional arguments passed to |
Details
A (generalized) additive model is fitted using a boosting algorithm based on component-wise base-learners.
The base-learners can either be specified via the formula
object or via
the baselearner
argument. The latter argument is the default base-learner
which is used for all variables in the formula, whithout explicit base-learner
specification (i.e., if the base-learners are explicitly specified in formula
,
the baselearner
argument will be ignored for this variable).
Of note, "bss"
and "bns"
are deprecated and only in the list for
backward compatibility.
Note that more base-learners (i.e., in addition to the ones provided
via baselearner
) can be specified in formula
. See
baselearners
for details.
The only difference when calling mboost
and gamboost
is that the
latter function allows one to specify default degrees of freedom for smooth
effects specified via baselearner = "bbs"
. In all other cases,
degrees of freedom need to be set manually via a specific definition of the
corresponding base-learner.
Value
An object of class mboost
with print
,
AIC
, plot
and predict
methods being available.
References
Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324–339.
Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
Thomas Kneib, Torsten Hothorn and Gerhard Tutz (2009), Variable selection and model choice in geoadditive regression models, Biometrics, 65(2), 626–634.
Matthias Schmid and Torsten Hothorn (2008), Boosting additive models using component-wise P-splines as base-learners. Computational Statistics & Data Analysis, 53(2), 298–311.
Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109 – 2113.
Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid
(2014). Model-based Boosting in R: A Hands-on Tutorial Using the R
Package mboost. Computational Statistics, 29, 3–35.
doi:10.1007/s00180-012-0382-5
Available as vignette via: vignette(package = "mboost", "mboost_tutorial")
See Also
See mboost_fit
for the generic boosting function,
glmboost
for boosted linear models, and
blackboost
for boosted trees.
See baselearners
for possible base-learners.
See cvrisk
for cross-validated stopping iteration.
Furthermore see boost_control
, Family
and
methods
.
Examples
### a simple two-dimensional example: cars data
cars.gb <- gamboost(dist ~ speed, data = cars, dfbase = 4,
control = boost_control(mstop = 50))
cars.gb
AIC(cars.gb, method = "corrected")
### plot fit for mstop = 1, ..., 50
plot(dist ~ speed, data = cars)
tmp <- sapply(1:mstop(AIC(cars.gb)), function(i)
lines(cars$speed, predict(cars.gb[i]), col = "red"))
lines(cars$speed, predict(smooth.spline(cars$speed, cars$dist),
cars$speed)$y, col = "green")
### artificial example: sinus transformation
x <- sort(runif(100)) * 10
y <- sin(x) + rnorm(length(x), sd = 0.25)
plot(x, y)
### linear model
lines(x, fitted(lm(y ~ sin(x) - 1)), col = "red")
### GAM
lines(x, fitted(gamboost(y ~ x,
control = boost_control(mstop = 500))),
col = "green")