rfpi {RFpredInterval} | R Documentation |
Prediction intervals with random forests
Description
Constructs prediction intervals with 15 distinct variations proposed by Roy and Larocque (2020). The variations include two aspects: The method used to build the forest and the method used to build the prediction interval. There are three methods to build the forest, (i) least-squares (LS), (ii) L1 and (iii) shortest prediction interval (SPI) from the CART paradigm. There are five methods for constructing prediction intervals, classical method, shortest prediction interval, quantile method, highest density region, and contiguous HDR.
Usage
rfpi(
formula,
traindata,
testdata,
alpha = 0.05,
split_rule = c("ls", "l1", "spi"),
pi_method = c("lm", "spi", "quant", "hdr", "chdr"),
calibration = TRUE,
rf_package = c("rfsrc", "ranger"),
params_rfsrc = list(ntree = 2000, mtry = ceiling(px/3), nodesize = 5, samptype =
"swr"),
params_ranger = list(num.trees = 2000, mtry = ceiling(px/3), min.node.size = 5,
replace = TRUE),
params_calib = list(range = c(1 - alpha - 0.005, 1 - alpha + 0.005), start = (1 -
alpha), step = 0.01, refine = TRUE),
oob = FALSE
)
Arguments
formula |
Object of class |
traindata |
Training data of class |
testdata |
Test data of class |
alpha |
Confidence level. (1 - |
split_rule |
Split rule for building a forest. Options are |
pi_method |
Methods for building a prediction interval. Options are
|
calibration |
Apply OOB calibration for finding working level of
|
rf_package |
Random forest package that can be used for RF training.
Options are |
params_rfsrc |
List of parameters that should be passed to
|
params_ranger |
List of parameters that should be passed to
|
params_calib |
List of parameters for calibration procedure.
|
oob |
Should out-of-bag (OOB) predictions and prediction intervals for the training observations be returned? |
Value
A list with the following components:
lm_interval |
Prediction intervals for test data with the classical method. A list containing lower and upper bounds. |
spi_interval |
Prediction intervals for test data with SPI method. A list containing lower and upper bounds. |
hdr_interval |
Prediction intervals for test data with HDR method. A list containing lower and upper bounds of prediction interval for each test observation. There may be multiple PIs for a single observation. |
chdr_interval |
Prediction intervals for test data with contiguous HDR method. A list containing lower and upper bounds. |
quant_interval |
Prediction intervals for test data with quantiles method. A list containing lower and upper bounds. |
test_pred |
Random forest predictions for test data. |
test_response |
If available, test response. |
alphaw |
Working level of |
split_rule |
Split rule used for building the random forest. |
rf_package |
Random forest package that was used for RF training. |
oob_pred_interval |
Out-of-bag (OOB) prediction intervals for train
data. Prediction intervals are built with |
oob_pred |
Out-of-bag (OOB) predictions for train data.
If |
train_response |
Train response. |
Details
Calibration process
The calibration procedure uses the "Bag of Observations for Prediction" (BOP) idea. BOP for a new observation is built with the set inbag observations that are in the same terminal nodes as the new observation. The calibration procedure uses the BOPs constructed for the training observations. BOP for a training observation is built using only the trees where this training observation is out-of-bag (OOB).
Let (1-\alpha
) be the target coverage level. The goal of the
calibration is to find the value of \alpha_w
, which is the working
level of \alpha
called by Roy and Larocque (2020), such that the
coverage level of the prediction intervals for the training observations is
closest to the target coverage level. The idea is to find the value of
\alpha_w
using the OOB-BOPs. Once found, (1-\alpha_w
) becomes
the level used to build the prediction intervals for the new observations.
References
Roy, M. H., & Larocque, D. (2020). Prediction intervals with random forests. Statistical methods in medical research, 29(1), 205-229. doi:10.1177/0962280219829885.
See Also
piall
pibf
print.rfpredinterval
Examples
## load example data
data(BostonHousing, package = "RFpredInterval")
set.seed(2345)
## define train/test split
testindex <- 1:10
trainindex <- sample(11:nrow(BostonHousing), size = 100, replace = FALSE)
traindata <- BostonHousing[trainindex, ]
testdata <- BostonHousing[testindex, ]
px <- ncol(BostonHousing) - 1
## contruct 90% PI with "l1" split rule and "spi" PI method with calibration
out <- rfpi(formula = medv ~ ., traindata = traindata,
testdata = testdata, alpha = 0.1, calibration = TRUE,
split_rule = "l1", pi_method = "spi", params_rfsrc = list(ntree = 50),
params_calib = list(range = c(0.89, 0.91), start = 0.9, step = 0.01,
refine = TRUE))
## get the PI with "spi" method for first observation in the testdata
c(out$spi_interval$lower[1], out$spi_interval$upper[1])
## get the random forest predictions for testdata
out$test_pred
## get the working level of alpha (alphaw)
out$alphaw
## contruct 95% PI with "ls" split rule, "lm" and "quant" PI methods
## with calibration and use "ranger" package for RF training
out2 <- rfpi(formula = medv ~ ., traindata = traindata,
testdata = testdata, split_rule = "ls", pi_method = c("lm", "quant"),
rf_package = "ranger", params_ranger = list(num.trees = 50))
## get the PI with "quant" method for the testdata
cbind(out2$quant_interval$lower, out2$quant_interval$upper)