pibf {RFpredInterval}R Documentation

Prediction intervals with boosted forests

Description

Constructs prediction intervals with boosted forests.

Usage

pibf(
  formula,
  traindata,
  testdata,
  alpha = 0.05,
  calibration = c("cv", "oob", FALSE),
  coverage_range = c(1 - alpha - 0.005, 1 - alpha + 0.005),
  numfolds = 5,
  params_ranger = list(num.trees = 2000, mtry = ceiling(px/3), min.node.size = 5,
    replace = TRUE),
  oob = FALSE
)

Arguments

formula

Object of class formula or character describing the model to fit.

traindata

Training data of class data.frame.

testdata

Test data of class data.frame.

alpha

Confidence level. (1 - alpha) is the desired coverage level. The default is alpha = 0.05 for the 95% prediction interval.

calibration

Calibration method for finding working level of alpha, i.e. \alpha_w. Options are "cv", "oob", and FALSE standing for calibration with cross-validation, OOB calibration, and no calibration, respectively. See below for details. The default is "cv".

coverage_range

The allowed target calibration range for coverage level. \alpha_w is selected such that the "cv" or "oob" coverage is within coverage_range.

numfolds

Number of folds for calibration with cross-validation. The default is 5 folds.

params_ranger

List of parameters that should be passed to ranger. In the default parameter set, num.trees = 2000, mtry = px/3 (rounded up), min.node.size = 5, replace = TRUE. See ranger for possible parameters.

oob

Should out-of-bag (OOB) predictions and prediction intervals for the training observations be returned?

Value

A list with the following components:

pred_interval

Prediction intervals for test data. A list containing lower and upper bounds.

test_pred

Bias-corrected random forest predictions for test data.

alphaw

Working level of alpha, i.e. \alpha_w. If calibration = FALSE, it returns NULL.

test_response

If available, test response.

oob_pred_interval

Out-of-bag (OOB) prediction intervals for train data. Prediction intervals are built with alpha. If oob = FALSE, it returns NULL.

oob_pred

Bias-corrected out-of-bag (OOB) predictions for train data. If oob = FALSE, it returns NULL.

train_response

Train response.

Details

Calibration process

Let (1-\alpha) be the target coverage level. The goal of the calibration is to find the value of \alpha_w, which is the working level of \alpha called by Roy and Larocque (2020), such that the coverage level of the PIs for the training observations is closest to the target coverage level. Two calibration procedures are provided: calibration with cross-validation and out-of-bag (OOB) calibration.

  1. In calibration with CV, we apply k-fold cross-validation to form prediction intervals for the training observations. In each fold, we split the original training data set into training and testing sets. For the training set, we train a one-step boosted random forest and compute the OOB residuals. Then, for each observation in the testing set, we build a PI. After completing CV, we compute the coverage level with the constructed PIs and if the coverage is not within the acceptable coverage range (coverage_range), then we apply a grid search to find the \alpha_w such that \alpha_w is the closest to the target \alpha among the set of \alpha_w's that ensures the target coverage level for the constructed PIs. Once we find the \alpha_w, we use this level to build the PI for the new observations.

  2. The OOB calibration procedure is proposed by Roy and Larocque (2020) and it is the default calibration procedure of rfpi(). See details section of rfpi() for the detailed explanation of this calibration procedure.

In terms of computational time, OOB calibration is faster than calibration with CV. However, empirical results show that OOB calibration may result in conservative prediction intervals. Therefore, the recommended calibration procedure for the PIBF method is calibration with CV.

References

Alakus, C., Larocque, D., & Labbe, A. (2022). RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests. R JOURNAL, 14(1), 300-319.

Roy, M. H., & Larocque, D. (2020). Prediction intervals with random forests. Statistical methods in medical research, 29(1), 205-229. doi:10.1177/0962280219829885.

See Also

piall rfpi print.rfpredinterval

Examples


## load example data
data(BostonHousing, package = "RFpredInterval")
set.seed(2345)

## define train/test split
testindex <- 1:10
trainindex <- sample(11:nrow(BostonHousing), size = 100, replace = FALSE)
traindata <- BostonHousing[trainindex, ]
testdata <- BostonHousing[testindex, ]
px <- ncol(BostonHousing) - 1

## construct 95% PI with "cv" calibration using 5-folds
out <- pibf(formula = medv ~ ., traindata = traindata,
  testdata = testdata, calibration = "cv", numfolds = 5,
  params_ranger = list(num.trees = 40))

## get the PI for the first observation in the testdata
c(out$pred_interval$lower[1], out$pred_interval$upper[1])

## get the bias-corrected random forest predictions for testdata
out$test_pred

## construct 90% PI with "oob" calibration
out2 <- pibf(formula = medv ~ ., traindata = traindata,
  testdata = testdata, alpha = 0.1, calibration = "oob",
  coverage_range = c(0.89,91), params_ranger = list(num.trees = 40))

## get the PI for the testdata
out2$pred_interval

## get the working level of alpha (alphaw)
out2$alphaw



[Package RFpredInterval version 1.0.8 Index]