Boost {RRBoost}R Documentation

Robust Boosting for regression

Description

This function implements the RRBoost robust boosting algorithm for regression, as well as other robust and non-robust boosting algorithms for regression.

Usage

Boost(
  x_train,
  y_train,
  x_val,
  y_val,
  x_test,
  y_test,
  type = "RRBoost",
  error = c("rmse", "aad"),
  niter = 200,
  y_init = "LADTree",
  max_depth = 1,
  tree_init_provided = NULL,
  control = Boost.control()
)

Arguments

x_train

predictor matrix for training data (matrix/dataframe)

y_train

response vector for training data (vector/dataframe)

x_val

predictor matrix for validation data (matrix/dataframe)

y_val

response vector for validation data (vector/dataframe)

x_test

predictor matrix for test data (matrix/dataframe, optional, required when make_prediction in control is TRUE)

y_test

response vector for test data (vector/dataframe, optional, required when make_prediction in control is TRUE)

type

type of the boosting method: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string)

error

a character string (or vector of character strings) indicating the type of error metrics to be evaluated on the test set. Valid options are: "rmse" (root mean squared error), "aad" (average absolute deviation), and "trmse" (trimmed root mean squared error)

niter

number of boosting iterations (for RRBoost: T_1,max + T_2,max) (numeric)

y_init

a string indicating the initial estimator to be used. Valid options are: "median" or "LADTree" (character string)

max_depth

the maximum depth of the tree learners (numeric)

tree_init_provided

an optional pre-fitted initial tree (an rpart object)

control

a named list of control parameters, as returned by Boost.control

Details

This function implements a robust boosting algorithm for regression (RRBoost). It also includes the following robust and non-robust boosting algorithms for regression: L2Boost, LADBoost, MBoost, Robloss, and SBoost. This function uses the functions available in the rpart package to construct binary regression trees.

Value

A list with the following components:

type

which boosting algorithm was run. One of: "L2Boost", "LADBoost", "MBoost", "Robloss", "SBoost", "RRBoost" (character string)

control

the list of control parameters used

niter

number of iterations for the boosting algorithm (for RRBoost T_1,max + T_2,max) (numeric)

error

if make_prediction = TRUE in argument control, a vector of prediction errors evaluated on the test set at early stopping time. The length of the vector matches that of the error argument in the input.

tree_init

if y_init = "LADTree", the initial tree (an object of class rpart)

tree_list

if save_tree = TRUE in control, a list of trees fitted at each boosting iteration

f_train_init

a vector of the initialized estimator of the training data

alpha

a vector of base learners' coefficients

early_stop_idx

early stopping iteration

when_init

if type = "RRBoost", the early stopping time of the first stage of RRBoost

loss_train

a vector of training loss values (one per iteration)

loss_val

a vector of validation loss values (one per iteration)

err_val

a vector of validation aad errors (one per iteration)

err_train

a vector of training aad errors (one per iteration)

err_test

a matrix of test errors before and at the early stopping iteration (returned if make_prediction = TRUE in control); the matrix dimension is the early stopping iteration by the number of error types (matches the error argument in the input); each row corresponds to the test errors at each iteration

f_train

a matrix of training function estimates at all iterations (returned if save_f = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration

f_val

a matrix of validation function estimates at all iterations (returned if save_f = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration

f_test

a matrix of test function estimatesbefore and at the early stopping iteration (returned if save_f = TRUE and make_prediction = TRUE in control); each column corresponds to the fitted values of the predictor at each iteration

var_select

a vector of variable selection indicators (one per explanatory variable; 1 if the variable was selected by at least one of the base learners, and 0 otherwise)

var_importance

a vector of permutation variable importance scores (one per explanatory variable, and returned if cal_imp = TRUE in control)

Author(s)

Xiaomeng Ju, xmengju@stat.ubc.ca

See Also

Boost.validation, Boost.control.

Examples

data(airfoil)
n <- nrow(airfoil)
n0 <- floor( 0.2 * n )
set.seed(123)
idx_test <- sample(n, n0)
idx_train <- sample((1:n)[-idx_test], floor( 0.6 * n ) )
idx_val <- (1:n)[ -c(idx_test, idx_train) ]
xx <- airfoil[, -6]
yy <- airfoil$y
xtrain <- xx[ idx_train, ]
ytrain <- yy[ idx_train ]
xval <- xx[ idx_val, ]
yval <- yy[ idx_val ]
xtest <- xx[ idx_test, ]
ytest <- yy[ idx_test ]
model_RRBoost_LADTree = Boost(x_train = xtrain, y_train = ytrain,
    x_val = xval, y_val = yval, x_test = xtest, y_test = ytest,
    type = "RRBoost", error = "rmse", y_init = "LADTree",
    max_depth = 1, niter = 10, ## to keep the running time low
    control = Boost.control(max_depth_init = 2,
    min_leaf_size_init = 20, make_prediction =  TRUE,
    cal_imp = FALSE))


[Package RRBoost version 0.1 Index]