R: Robust Boosting for Robust Loss Functions

rbst {bst}

R Documentation

Robust Boosting for Robust Loss Functions

Description

MM (majorization/minimization) algorithm based gradient boosting for optimizing nonconvex robust loss functions with componentwise linear, smoothing splines, tree models as base learners.

Usage

rbst(x, y, cost = 0.5, rfamily = c("tgaussian", "thuber","thinge", "tbinom", "binomd", 
"texpo", "tpoisson", "clossR", "closs", "gloss", "qloss"), ctrl=bst_control(), 
control.tree=list(maxdepth = 1), learner=c("ls","sm","tree"),del=1e-10)

Arguments

`x`	a data frame containing the variables in the model.
`y`	vector of responses. `y` must be in {1, -1} for classification.
`cost`	price to pay for false positive, 0 < `cost` < 1; price of false negative is 1-`cost`.
`rfamily`	robust loss function, see details.
`ctrl`	an object of class `bst_control`.
`control.tree`	control parameters of rpart.
`learner`	a character specifying the component-wise base learner to be used: `ls` linear models, `sm` smoothing splines, `tree` regression trees.
`del`	convergency criteria

Details

An MM algorithm operates by creating a convex surrogate function that majorizes the nonconvex objective function. When the surrogate function is minimized with gradient boosting algorithm, the desired objective function is decreased. The MM algorithm contains difference of convex (DC) algorithm for rfamily=c("tgaussian", "thuber","thinge", "tbinom", "binomd", "texpo", "tpoisson") and quadratic majorization boosting algorithm (QMBA) for rfamily=c("clossR", "closs", "gloss", "qloss").

rfamily = "tgaussian" for truncated square error loss, "thuber" for truncated Huber loss, "thinge" for truncated hinge loss, "tbinom" for truncated logistic loss, "binomd" for logistic difference loss, "texpo" for truncated exponential loss, "tpoisson" for truncated Poisson loss, "clossR" for C-loss in regression, "closs" for C-loss in classification, "gloss" for G-loss, "qloss" for Q-loss.

s must be a numeric value to be specified in bst_control. For rfamily="thinge", "tbinom", "texpo" s < 0. For rfamily="binomd", "tpoisson", "closs", "qloss", "clossR" , s > 0 and for rfamily="gloss", s > 1. Some suggested s values: "thinge"= -1, "tbinom"= -log(3), "binomd"= log(4), "texpo"= log(0.5), "closs"=1, "gloss"=1.5, "qloss"=2, "clossR"=1.

Value

An object of class bst with print, coef, plot and predict methods are available for linear models. For nonlinear models, methods print and predict are available.

`x`, `y`, `cost`, `rfamily`, `learner`, `control.tree`, `maxdepth`	These are input variables and parameters
`ctrl`	the input `ctrl` with possible updated `fk` if `family="tgaussian"`, `"thingeDC"`, `"tbinomDC"`, `"binomdDC"` or `"tpoisson"`.
`yhat`	predicted function estimates
`ens`	a list of length `mstop`. Each element is a fitted model to the pseudo residuals, defined as negative gradient of loss function at the current estimated function
`ml.fit`	the last element of `ens`
`ensemble`	a vector of length `mstop`. Each element is the variable selected in each boosting step when applicable
`xselect`	selected variables in `mstop`
`coef`	estimated coefficients in `mstop`

Author(s)

Zhu Wang

References

Zhu Wang (2018), Quadratic Majorization for Nonconvex Loss with Applications to the Boosting Algorithm, Journal of Computational and Graphical Statistics, 27(3), 491-502, doi: 10.1080/10618600.2018.1424635

Zhu Wang (2018), Robust boosting with truncated loss functions, Electronic Journal of Statistics, 12(1), 599-650, doi: 10.1214/18-EJS1404

Examples

x <- matrix(rnorm(100*5),ncol=5)
c <- 2*x[,1]
p <- exp(c)/(exp(c)+exp(-c))
y <- rbinom(100,1,p)
y[y != 1] <- -1
y[1:10] <- -y[1:10]
x <- as.data.frame(x)
dat.m <- bst(x, y, ctrl = bst_control(mstop=50), family = "hinge", learner = "ls")
predict(dat.m)
dat.m1 <- bst(x, y, ctrl = bst_control(twinboost=TRUE, 
coefir=coef(dat.m), xselect.init = dat.m$xselect, mstop=50))
dat.m2 <- rbst(x, y, ctrl = bst_control(mstop=50, s=0, trace=TRUE), 
rfamily = "thinge", learner = "ls")
predict(dat.m2)

[Package bst version 0.3-24 Index]