opt_boost {bamlss}  R Documentation 
Optimizer functions for gradient and likelihood boosting with bamlss
. In each
boosting iteration the function selects the model term with the largest contribution to the
loglikelihood, AIC or BIC.
## Gradient boosting optimizer. opt_boost(x, y, family, weights = NULL, offset = NULL, nu = 0.1, nu.adapt = TRUE, df = 4, maxit = 400, mstop = NULL, maxq = NULL, qsel.splitfactor = FALSE, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, nback = NULL, plot = TRUE, initialize = TRUE, stop.criterion = NULL, select.type = 1, force.stop = TRUE, hatmatrix = !is.null(stop.criterion), reverse.edf = FALSE, approx.edf = FALSE, always = FALSE, ...) boost(x, y, family, weights = NULL, offset = NULL, nu = 0.1, nu.adapt = TRUE, df = 4, maxit = 400, mstop = NULL, maxq = NULL, qsel.splitfactor = FALSE, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, nback = NULL, plot = TRUE, initialize = TRUE, stop.criterion = NULL, select.type = 1, force.stop = TRUE, hatmatrix = !is.null(stop.criterion), reverse.edf = FALSE, approx.edf = FALSE, always = FALSE, ...) ## Modified likelihood based boosting. opt_boostm(x, y, family, offset = NULL, nu = 0.1, df = 3, maxit = 400, mstop = NULL, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, plot = TRUE, initialize = TRUE, stop.criterion = "BIC", force.stop = !is.null(stop.criterion), do.optim = TRUE, always = FALSE, ...) boostm(x, y, family, offset = NULL, nu = 0.1, df = 3, maxit = 400, mstop = NULL, verbose = TRUE, digits = 4, flush = TRUE, eps = .Machine$double.eps^0.25, plot = TRUE, initialize = TRUE, stop.criterion = "BIC", force.stop = !is.null(stop.criterion), do.optim = TRUE, always = FALSE, ...) ## Boosting summary extractor. boost_summary(object, ...) ## Plot all boosting paths. boost_plot(x, which = c("loglik", "loglik.contrib", "parameters", "aic", "bic", "user"), intercept = TRUE, spar = TRUE, mstop = NULL, name = NULL, drop = NULL, labels = NULL, color = NULL, ...) ## Boosting summary printing and plotting. ## S3 method for class 'boost_summary' print(x, summary = TRUE, plot = TRUE, which = c("loglik", "loglik.contrib"), intercept = TRUE, spar = TRUE, ...) ## S3 method for class 'boost_summary' plot(x, ...) ## Model frame for outofsample selection. boost_frame(formula, train, test, family = "gaussian", ...)
x 
For function 
y 
The model response, as returned from function 
family 
A bamlss family object, see 
weights 
Prior weights on the data, as returned from function 
offset 
Can be used to supply model offsets for use in fitting,
returned from function 
nu 
Numeric, between [0, 1], controls the step size, i.e., the amount that should be added to model term parameters. 
nu.adapt 
Logical. If set to TRUE (default) step size 
df 
Integer, defines the initial degrees of freedom that should be assigned
to each smooth model term. May also be a named vector, the names must match the model term
labels, e.g., as provided in 
maxit 
Integer, the maximum number of boosting iterations. 
mstop 
For convenience, overwrites 
maxq 
Integer, defines the maximum number of selected baselearners. The algorithm stops if this numer is exceeded. 
qsel.splitfactor 
Logical, if set to 
name 
Character, the name of the coefficient (group) that should be plotted. Note that
the string provided in 
drop 
Character, the name of the coefficient (group) that should not be plotted. 
labels 
A character string of labels that should be used on the 4 axis. 
color 
Colors or color function that creates colors for the (group) paths. 
verbose 
Print information during runtime of the algorithm. 
digits 
Set the digits for printing when 
flush 
use 
eps 
The tolerance used as stopping mechanism, see argument 
nback 
Integer. If 
plot 
Should the boosting summary be printed and plotted? 
initialize 
Logical, should intercepts be initialized? 
stop.criterion 
Character, selects the information criterion that should be used
to determine the optimum number of boosting iterations. Either 
select.type 
Should model terms be selected by the loglikelihood contribution,

force.stop 
Logical, should the algorithm stop if the information criterion increases? 
do.optim 
Logical. Should smoothing parameters be optimized in each boosting iteration? 
hatmatrix 
Logical, if set to 
reverse.edf 
Logical. Instead of computing degrees of freedom with hatmatrices, the actual smoothing parameters are reverse engineered to compute the corresponding actual smoother matrix. Note that this option is still experimental. 
approx.edf 
Logical. Another experimental and fast approximation of the degrees of freedom. 
always 
Logical or character. Should the intercepts forced to be updated in each boosting iteration?
If 
object 
A 
summary 
Should the summary be printed? 
which 
Which of the three provided plots should be created? 
intercept 
Should the coefficient paths of intercepts be dropped in the plot? 
spar 
Should graphical parmeters be set with 
formula 
See 
train, test 
Data frames used for training and testing the model.. 
... 
For function 
For function boost_summary()
a list containing information on selection frequencies etc.
For function opt_boost()
and opt_boostm()
a list containing the following objects:
fitted.values 
A named list of the fitted values based on the last boosting iteration of the modeled parameters of the selected distribution. 
parameters 
A matrix, each row corresponds to the parameter values of one boosting iteration. 
boost_summary 
The boosting summary which can be printed and plotted. 
The function does not take care of variable scaling for the linear parts! This must be done by the
user, e.g., one option is to use argument scale.d
in function bamlss.frame
,
which uses scale
.
Function opt_boost()
does not select the optimum stopping iteration! The modified likelihood
based algorithm implemented in function opt_boostm()
is still experimental!
## Not run: ## Simulate data. set.seed(123) d < GAMart() ## Estimate model. f < num ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat) b < bamlss(f, data = d, optimizer = opt_boost, sampler = FALSE, scale.d = TRUE, nu = 0.01, maxit = 1000, plot = FALSE) ## Plot estimated effects. plot(b) ## Print and plot the boosting summary. boost_summary(b, plot = FALSE) boost_plot(b, which = 1) boost_plot(b, which = 2) boost_plot(b, which = 3, name = "mu.s.te(lon,lat).") ## Extract estimated parameters for certain ## boosting iterations. parameters(b, mstop = 1) parameters(b, mstop = 100) ## Also works with predict(). head(do.call("cbind", predict(b, mstop = 1))) head(do.call("cbind", predict(b, mstop = 100))) ## Another example using the modified likelihood ## boosting algorithm. f < list( num ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat), sigma ~ x1 + x2 + x3 + lon + lat + s(x1) + s(x2) + s(x3) + s(lon) + s(lat) + te(lon,lat) ) b < bamlss(f, data = d, optimizer = opt_boostm, sampler = FALSE, scale.d = TRUE, nu = 0.05, maxit = 400, stop.criterion = "AIC", force.stop = FALSE) ## Plot estimated effects. plot(b) ## Plot AIC and loglik contributions. boost_plot(b, "AIC") boost_plot(b, "loglik.contrib") ## Outofsample selection of model terms. set.seed(123) d < GAMart(n = 5000) ## Split data into training and testing i < sample(1:2, size = nrow(d), replace = TRUE) dtest < subset(d, i == 1) dtrain < subset(d, i == 2) ## Model formula f < list( num ~ s(x1) + s(x2) + s(x3), sigma ~ s(x1) + s(x2) + s(x3) ) ## Create model frame for outofsample selection. sm < boost_frame(f, train = dtrain, test = dtest, family = "gaussian") ## Outofsample selection function. sfun < function(parameters) { sm$parameters < parameters p < predict(sm, type = "parameter") 1 * sum(sm$family$d(dtest$num, p, log = TRUE)) } ## Start boosting with outofsample negative ## loglikelihood selection of model terms. b < bamlss(f, data = dtrain, sampler = FALSE, optimizer = opt_boost, selectfun = sfun, always = "best") ## Plot curve of negative outofsample loglikelihood. boost_plot(b, which = "user") ## End(Not run)