pibf {RFpredInterval} | R Documentation |
Prediction intervals with boosted forests
Description
Constructs prediction intervals with boosted forests.
Usage
pibf(
formula,
traindata,
testdata,
alpha = 0.05,
calibration = c("cv", "oob", FALSE),
coverage_range = c(1 - alpha - 0.005, 1 - alpha + 0.005),
numfolds = 5,
params_ranger = list(num.trees = 2000, mtry = ceiling(px/3), min.node.size = 5,
replace = TRUE),
oob = FALSE
)
Arguments
formula |
Object of class |
traindata |
Training data of class |
testdata |
Test data of class |
alpha |
Confidence level. (1 - |
calibration |
Calibration method for finding working level of
|
coverage_range |
The allowed target calibration range for coverage level.
|
numfolds |
Number of folds for calibration with cross-validation. The default is 5 folds. |
params_ranger |
List of parameters that should be passed to
|
oob |
Should out-of-bag (OOB) predictions and prediction intervals for the training observations be returned? |
Value
A list with the following components:
pred_interval |
Prediction intervals for test data. A list containing lower and upper bounds. |
test_pred |
Bias-corrected random forest predictions for test data. |
alphaw |
Working level of |
test_response |
If available, test response. |
oob_pred_interval |
Out-of-bag (OOB) prediction intervals for train
data. Prediction intervals are built with |
oob_pred |
Bias-corrected out-of-bag (OOB) predictions for train data.
If |
train_response |
Train response. |
Details
Calibration process
Let (1-\alpha
) be the target coverage level. The goal of the
calibration is to find the value of \alpha_w
, which is the working
level of \alpha
called by Roy and Larocque (2020), such that the
coverage level of the PIs for the training observations is closest to the
target coverage level. Two calibration procedures are provided: calibration
with cross-validation and out-of-bag (OOB) calibration.
In calibration with CV, we apply k-fold cross-validation to form prediction intervals for the training observations. In each fold, we split the original training data set into training and testing sets. For the training set, we train a one-step boosted random forest and compute the OOB residuals. Then, for each observation in the testing set, we build a PI. After completing CV, we compute the coverage level with the constructed PIs and if the coverage is not within the acceptable coverage range (
coverage_range
), then we apply a grid search to find the\alpha_w
such that\alpha_w
is the closest to the target\alpha
among the set of\alpha_w
's that ensures the target coverage level for the constructed PIs. Once we find the\alpha_w
, we use this level to build the PI for the new observations.The OOB calibration procedure is proposed by Roy and Larocque (2020) and it is the default calibration procedure of
rfpi()
. See details section ofrfpi()
for the detailed explanation of this calibration procedure.
In terms of computational time, OOB calibration is faster than calibration with CV. However, empirical results show that OOB calibration may result in conservative prediction intervals. Therefore, the recommended calibration procedure for the PIBF method is calibration with CV.
References
Alakus, C., Larocque, D., & Labbe, A. (2022). RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests. R JOURNAL, 14(1), 300-319.
Roy, M. H., & Larocque, D. (2020). Prediction intervals with random forests. Statistical methods in medical research, 29(1), 205-229. doi:10.1177/0962280219829885.
See Also
piall
rfpi
print.rfpredinterval
Examples
## load example data
data(BostonHousing, package = "RFpredInterval")
set.seed(2345)
## define train/test split
testindex <- 1:10
trainindex <- sample(11:nrow(BostonHousing), size = 100, replace = FALSE)
traindata <- BostonHousing[trainindex, ]
testdata <- BostonHousing[testindex, ]
px <- ncol(BostonHousing) - 1
## construct 95% PI with "cv" calibration using 5-folds
out <- pibf(formula = medv ~ ., traindata = traindata,
testdata = testdata, calibration = "cv", numfolds = 5,
params_ranger = list(num.trees = 40))
## get the PI for the first observation in the testdata
c(out$pred_interval$lower[1], out$pred_interval$upper[1])
## get the bias-corrected random forest predictions for testdata
out$test_pred
## construct 90% PI with "oob" calibration
out2 <- pibf(formula = medv ~ ., traindata = traindata,
testdata = testdata, alpha = 0.1, calibration = "oob",
coverage_range = c(0.89,91), params_ranger = list(num.trees = 40))
## get the PI for the testdata
out2$pred_interval
## get the working level of alpha (alphaw)
out2$alphaw