pibf {RFpredInterval}R Documentation

Prediction intervals with boosted forests


Constructs prediction intervals with boosted forests.


  alpha = 0.05,
  calibration = c("cv", "oob", FALSE),
  coverage_range = c(1 - alpha - 0.005, 1 - alpha + 0.005),
  numfolds = 5,
  params_ranger = list(num.trees = 2000, mtry = ceiling(px/3), min.node.size = 5,
    replace = TRUE),
  oob = FALSE



Object of class formula or character describing the model to fit.


Training data of class data.frame.


Test data of class data.frame.


Confidence level. (1 - alpha) is the desired coverage level. The default is alpha = 0.05 for the 95% prediction interval.


Calibration method for finding working level of alpha, i.e. αw\alpha_w. Options are "cv", "oob", and FALSE standing for calibration with cross-validation, OOB calibration, and no calibration, respectively. See below for details. The default is "cv".


The allowed target calibration range for coverage level. αw\alpha_w is selected such that the "cv" or "oob" coverage is within coverage_range.


Number of folds for calibration with cross-validation. The default is 5 folds.


List of parameters that should be passed to ranger. In the default parameter set, num.trees = 2000, mtry = px/3px/3 (rounded up), min.node.size = 5, replace = TRUE. See ranger for possible parameters.


Should out-of-bag (OOB) predictions and prediction intervals for the training observations be returned?


A list with the following components:


Prediction intervals for test data. A list containing lower and upper bounds.


Bias-corrected random forest predictions for test data.


Working level of alpha, i.e. αw\alpha_w. If calibration = FALSE, it returns NULL.


If available, test response.


Out-of-bag (OOB) prediction intervals for train data. Prediction intervals are built with alpha. If oob = FALSE, it returns NULL.


Bias-corrected out-of-bag (OOB) predictions for train data. If oob = FALSE, it returns NULL.


Train response.


Calibration process

Let (1α1-\alpha) be the target coverage level. The goal of the calibration is to find the value of αw\alpha_w, which is the working level of α\alpha called by Roy and Larocque (2020), such that the coverage level of the PIs for the training observations is closest to the target coverage level. Two calibration procedures are provided: calibration with cross-validation and out-of-bag (OOB) calibration.

  1. In calibration with CV, we apply k-fold cross-validation to form prediction intervals for the training observations. In each fold, we split the original training data set into training and testing sets. For the training set, we train a one-step boosted random forest and compute the OOB residuals. Then, for each observation in the testing set, we build a PI. After completing CV, we compute the coverage level with the constructed PIs and if the coverage is not within the acceptable coverage range (coverage_range), then we apply a grid search to find the αw\alpha_w such that αw\alpha_w is the closest to the target α\alpha among the set of αw\alpha_w's that ensures the target coverage level for the constructed PIs. Once we find the αw\alpha_w, we use this level to build the PI for the new observations.

  2. The OOB calibration procedure is proposed by Roy and Larocque (2020) and it is the default calibration procedure of rfpi(). See details section of rfpi() for the detailed explanation of this calibration procedure.

In terms of computational time, OOB calibration is faster than calibration with CV. However, empirical results show that OOB calibration may result in conservative prediction intervals. Therefore, the recommended calibration procedure for the PIBF method is calibration with CV.


Alakus, C., Larocque, D., & Labbe, A. (2022). RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests. R JOURNAL, 14(1), 300-319.

Roy, M. H., & Larocque, D. (2020). Prediction intervals with random forests. Statistical methods in medical research, 29(1), 205-229. doi:10.1177/0962280219829885.

See Also

piall rfpi print.rfpredinterval


## load example data
data(BostonHousing, package = "RFpredInterval")

## define train/test split
testindex <- 1:10
trainindex <- sample(11:nrow(BostonHousing), size = 100, replace = FALSE)
traindata <- BostonHousing[trainindex, ]
testdata <- BostonHousing[testindex, ]
px <- ncol(BostonHousing) - 1

## construct 95% PI with "cv" calibration using 5-folds
out <- pibf(formula = medv ~ ., traindata = traindata,
  testdata = testdata, calibration = "cv", numfolds = 5,
  params_ranger = list(num.trees = 40))

## get the PI for the first observation in the testdata
c(out$pred_interval$lower[1], out$pred_interval$upper[1])

## get the bias-corrected random forest predictions for testdata

## construct 90% PI with "oob" calibration
out2 <- pibf(formula = medv ~ ., traindata = traindata,
  testdata = testdata, alpha = 0.1, calibration = "oob",
  coverage_range = c(0.89,91), params_ranger = list(num.trees = 40))

## get the PI for the testdata

## get the working level of alpha (alphaw)

[Package RFpredInterval version 1.0.8 Index]