R: Diversity Boosting Algorithm

boosting_diversity {Bodi}

R Documentation

Diversity Boosting Algorithm

Description

Train a set of initial learners by promoting diversity among them. For this, a gradient descent strategy is adopted where a specialized loss function induces diversity which yields on a reduction of the mean-square-error of the aggregated learner.

Usage

boosting_diversity(
  target,
  cov,
  data0,
  data1,
  sample_size = 0.5,
  grad_step = 1,
  diversity_weight = 1,
  Nstep = 10,
  model = "gam",
  sampling = "random",
  Nblock = 10,
  aggregation_type = "uniform",
  param = list(),
  theorical_dw = FALSE,
  model_list = NULL,
  w_list = NULL,
  param_list = NULL,
  cov_list = NULL
)

Arguments

`target`	name of the target variable
`cov`	the model equation, a character string provided in the formula syntax. For example, for a linear model including covariates $X_1$ and $X_2$ it will be "X1+X2" and for a GAM with smooth effects it will be "s(X1)+s(X2)"
`data0`	the learning set
`data1`	the test set
`sample_size`	the size of the bootstrap sample as a proportion of the learning set size. sample_size=0.5 means that the resamples are of size n/2 where n is the number of rows of data0.
`grad_step`	step of the gradient descent
`diversity_weight`	the weight of the diversity encouraging penalty (kappa in the paper)
`Nstep`	the number of iterations of the diversity boosting algorithm ($N$ in the paper)
`model`	the type of base learner used in the algorithm if using a single base learner (model_list=NULL). Currently it could be either "gam" for an additive model, "rf" for a random forest, ""gbm" for gradient boosting machines, "rpart" for single CART trees.
`sampling`	the type of sampling procedure used in the resampling step. Could be either `"random"` for uniform random sampling with replacement or `"blocks"` for uniform sampling with replacement of blocks of consecutive data points. Default is "random".
`Nblock`	number of blocks for the block sampling. Equal to 10 by default.
`aggregation_type`	type of aggregation used for the ensemble method, default is uniform weights but it could be also "MLpol" an aggregation algorithm from the opera package
`param`	a list containing the parameters of the model chosen. It could be e.g. the number of trees for "rf", the depth of the tree for "rpart"...
`theorical_dw`	set to TRUE if one want to use the theoretical upper bound of the diversity weight kappa
`model_list`	a list of model among the possible ones (see the description of model argument). In that case the week learner is sample at each step in the list. "Still "experimental", be careful.
`w_list`	the prior weights of each model in the model_list
`param_list`	list of parameters of each model in the model_list
`cov_list`	list of covariates of each model in the model_list

Value

a list including the boosted models, the ensemble forecast

`fitted_ensemble`	Fitted values (in-sample predictions) for the ensemble method (matrix).
`forecast_ensemble`	Forecast (out-sample predictions) for the ensemble method (matrix).
`fitted`	Fitted values of the last boosting iteration (vector).
`forecast`	Forecast of the last boosting iteration (vector).
`err_oob`	Estimated out-of-bag errors by iteration (vector).
`diversiy_oob`	Estimated out-of-bag diversity (vector).

Author(s)

Yannig Goude <yannig.goude@edf.fr>

Examples

all <- na.omit(airquality)
smp <- sample(nrow(all), floor(.8 * nrow(all)))
boosting_diversity("Ozone", "Solar.R+Wind+Temp+Month+Day", 
                   data0 = all[smp, ], data1 = all[-smp, ])

[Package Bodi version 0.1.0 Index]