R: Prediction intervals with boosted forests

pibf {RFpredInterval}

R Documentation

Prediction intervals with boosted forests

Description

Constructs prediction intervals with boosted forests.

Usage

pibf(
  formula,
  traindata,
  testdata,
  alpha = 0.05,
  calibration = c("cv", "oob", FALSE),
  coverage_range = c(1 - alpha - 0.005, 1 - alpha + 0.005),
  numfolds = 5,
  params_ranger = list(num.trees = 2000, mtry = ceiling(px/3), min.node.size = 5,
    replace = TRUE),
  oob = FALSE
)

Arguments

`formula`	Object of class `formula` or `character` describing the model to fit.
`traindata`	Training data of class `data.frame`.
`testdata`	Test data of class `data.frame`.
`alpha`	Confidence level. (1 - `alpha`) is the desired coverage level. The default is `alpha` = 0.05 for the 95% prediction interval.
`calibration`	Calibration method for finding working level of `alpha`, i.e. `\alpha_w`. Options are `"cv"`, `"oob"`, and `FALSE` standing for calibration with cross-validation, OOB calibration, and no calibration, respectively. See below for details. The default is `"cv"`.
`coverage_range`	The allowed target calibration range for coverage level. `\alpha_w` is selected such that the `"cv"` or `"oob"` coverage is within `coverage_range`.
`numfolds`	Number of folds for calibration with cross-validation. The default is 5 folds.
`params_ranger`	List of parameters that should be passed to `ranger`. In the default parameter set, `num.trees` = 2000, `mtry` = `px/3` (rounded up), `min.node.size` = 5, `replace` = TRUE. See `ranger` for possible parameters.
`oob`	Should out-of-bag (OOB) predictions and prediction intervals for the training observations be returned?

Value

A list with the following components:

`pred_interval`	Prediction intervals for test data. A list containing lower and upper bounds.
`test_pred`	Bias-corrected random forest predictions for test data.
`alphaw`	Working level of `alpha`, i.e. `\alpha_w`. If `calibration = FALSE`, it returns `NULL`.
`test_response`	If available, test response.
`oob_pred_interval`	Out-of-bag (OOB) prediction intervals for train data. Prediction intervals are built with `alpha`. If `oob = FALSE`, it returns `NULL`.
`oob_pred`	Bias-corrected out-of-bag (OOB) predictions for train data. If `oob = FALSE`, it returns `NULL`.
`train_response`	Train response.

Details

Calibration process

Let (1-\alpha) be the target coverage level. The goal of the calibration is to find the value of \alpha_w, which is the working level of \alpha called by Roy and Larocque (2020), such that the coverage level of the PIs for the training observations is closest to the target coverage level. Two calibration procedures are provided: calibration with cross-validation and out-of-bag (OOB) calibration.

In calibration with CV, we apply k-fold cross-validation to form prediction intervals for the training observations. In each fold, we split the original training data set into training and testing sets. For the training set, we train a one-step boosted random forest and compute the OOB residuals. Then, for each observation in the testing set, we build a PI. After completing CV, we compute the coverage level with the constructed PIs and if the coverage is not within the acceptable coverage range (coverage_range), then we apply a grid search to find the \alpha_w such that \alpha_w is the closest to the target \alpha among the set of \alpha_w's that ensures the target coverage level for the constructed PIs. Once we find the \alpha_w, we use this level to build the PI for the new observations.
The OOB calibration procedure is proposed by Roy and Larocque (2020) and it is the default calibration procedure of rfpi(). See details section of rfpi() for the detailed explanation of this calibration procedure.

In terms of computational time, OOB calibration is faster than calibration with CV. However, empirical results show that OOB calibration may result in conservative prediction intervals. Therefore, the recommended calibration procedure for the PIBF method is calibration with CV.

References

Alakus, C., Larocque, D., & Labbe, A. (2022). RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests. R JOURNAL, 14(1), 300-319.

Roy, M. H., & Larocque, D. (2020). Prediction intervals with random forests. Statistical methods in medical research, 29(1), 205-229. doi:10.1177/0962280219829885.

Examples


## load example data
data(BostonHousing, package = "RFpredInterval")
set.seed(2345)

## define train/test split
testindex <- 1:10
trainindex <- sample(11:nrow(BostonHousing), size = 100, replace = FALSE)
traindata <- BostonHousing[trainindex, ]
testdata <- BostonHousing[testindex, ]
px <- ncol(BostonHousing) - 1

## construct 95% PI with "cv" calibration using 5-folds
out <- pibf(formula = medv ~ ., traindata = traindata,
  testdata = testdata, calibration = "cv", numfolds = 5,
  params_ranger = list(num.trees = 40))

## get the PI for the first observation in the testdata
c(out$pred_interval$lower[1], out$pred_interval$upper[1])

## get the bias-corrected random forest predictions for testdata
out$test_pred

## construct 90% PI with "oob" calibration
out2 <- pibf(formula = medv ~ ., traindata = traindata,
  testdata = testdata, alpha = 0.1, calibration = "oob",
  coverage_range = c(0.89,91), params_ranger = list(num.trees = 40))

## get the PI for the testdata
out2$pred_interval

## get the working level of alpha (alphaw)
out2$alphaw

[Package RFpredInterval version 1.0.8 Index]