R: Robust Boosting for regression

Boost {RRBoost}

R Documentation

Robust Boosting for regression

Description

This function implements the RRBoost robust boosting algorithm for regression, as well as other robust and non-robust boosting algorithms for regression.

Usage

Boost(
  x_train,
  y_train,
  x_val,
  y_val,
  x_test,
  y_test,
  type = "RRBoost",
  error = c("rmse", "aad"),
  niter = 200,
  y_init = "LADTree",
  max_depth = 1,
  tree_init_provided = NULL,
  control = Boost.control()
)

Arguments

`x_train`	predictor matrix for training data (matrix/dataframe)
`y_train`	response vector for training data (vector/dataframe)
`x_val`	predictor matrix for validation data (matrix/dataframe)
`y_val`	response vector for validation data (vector/dataframe)
`x_test`	predictor matrix for test data (matrix/dataframe, optional, required when `make_prediction` in control is `TRUE`)
`y_test`	response vector for test data (vector/dataframe, optional, required when `make_prediction` in control is `TRUE`)
`type`	type of the boosting method: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string)
`error`	a character string (or vector of character strings) indicating the type of error metrics to be evaluated on the test set. Valid options are: "rmse" (root mean squared error), "aad" (average absolute deviation), and "trmse" (trimmed root mean squared error)
`niter`	number of boosting iterations (for RRBoost: T_1,max + T_2,max) (numeric)
`y_init`	a string indicating the initial estimator to be used. Valid options are: "median" or "LADTree" (character string)
`max_depth`	the maximum depth of the tree learners (numeric)
`tree_init_provided`	an optional pre-fitted initial tree (an `rpart` object)
`control`	a named list of control parameters, as returned by `Boost.control`

Details

This function implements a robust boosting algorithm for regression (RRBoost). It also includes the following robust and non-robust boosting algorithms for regression: L2Boost, LADBoost, MBoost, Robloss, and SBoost. This function uses the functions available in the rpart package to construct binary regression trees.

Value

A list with the following components:

`type`	which boosting algorithm was run. One of: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string)
`control`	the list of control parameters used
`niter`	number of iterations for the boosting algorithm (for RRBoost T_1,max + T_2,max) (numeric)
`error`	if `make_prediction = TRUE` in argument `control`, a vector of prediction errors evaluated on the test set at early stopping time. The length of the vector matches that of the `error` argument in the input.
`tree_init`	if `y_init = "LADTree"`, the initial tree (an object of class `rpart`)
`tree_list`	if `save_tree = TRUE` in `control`, a list of trees fitted at each boosting iteration
`f_train_init`	a vector of the initialized estimator of the training data
`alpha`	a vector of base learners' coefficients
`early_stop_idx`	early stopping iteration
`when_init`	if `type = "RRBoost"`, the early stopping time of the first stage of RRBoost
`loss_train`	a vector of training loss values (one per iteration)
`loss_val`	a vector of validation loss values (one per iteration)
`err_val`	a vector of validation aad errors (one per iteration)
`err_train`	a vector of training aad errors (one per iteration)
`err_test`	a matrix of test errors before and at the early stopping iteration (returned if make_prediction = TRUE in control); the matrix dimension is the early stopping iteration by the number of error types (matches the `error` argument in the input); each row corresponds to the test errors at each iteration
`f_train`	a matrix of training function estimates at all iterations (returned if save_f = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration
`f_val`	a matrix of validation function estimates at all iterations (returned if save_f = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration
`f_test`	a matrix of test function estimatesbefore and at the early stopping iteration (returned if save_f = TRUE and make_prediction = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration
`var_select`	a vector of variable selection indicators (one per explanatory variable; 1 if the variable was selected by at least one of the base learners, and 0 otherwise)
`var_importance`	a vector of permutation variable importance scores (one per explanatory variable, and returned if cal_imp = TRUE in control)

Author(s)

Xiaomeng Ju, xmengju@stat.ubc.ca

Examples

data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
model_RRBoost_LADTree = Boost(x_train = xtrain, y_train = ytrain,
    x_val = xval, y_val = yval, x_test = xtest, y_test = ytest,
    type = "RRBoost", error = "rmse", y_init = "LADTree",
    max_depth = 1, niter = 10, ## to keep the running time low
    control = Boost.control(max_depth_init = 2,
    min_leaf_size_init = 20, make_prediction =  TRUE,
    cal_imp = FALSE))

[Package RRBoost version 0.1 Index]