predict.literanger {literanger} | R Documentation |
Literanger prediction
Description
'literanger' provides different types of prediction that may be used in multiple imputation algorithms with random forests. The usual prediction is the 'bagged' prediction, the most frequent value (or the mean) of the in-bag samples in a terminal node. Doove et al (2014) propose a prediction that better matches the predictive distribution as needed for multiple imputation; take a random draw from the observations in the terminal node from a randomly drawn tree in the forest for each predicted value needed. Alternatively, the usual most-frequent-value or mean of the in-bag responses can be used as in missForest (Stekhoven et al, 2014) or miceRanger https://cran.r-project.org/package=miceRanger and missRanger https://cran.r-project.org/package=missRanger.
Usage
## S3 method for class 'literanger'
predict(
object,
newdata = NULL,
prediction_type = c("bagged", "inbag", "nodes"),
seed = 1L + sample.int(n = .Machine$integer.max - 1L, size = 1),
n_thread = 0,
verbose = FALSE,
...
)
Arguments
object |
A trained random forest |
newdata |
Data of class |
prediction_type |
Name of the prediction algorithm; "bagged" is the most-frequent value among in-bag samples for classification, or the mean of in-bag responses for regression; "inbag" predicts by drawing one in-bag response from a random tree for each row; "nodes" (currently unsupported) returns the node keys (ids) of the terminal node from every tree for each row. |
seed |
Random seed, an integer between 1 and |
n_thread |
Number of threads. Default is determined by system, typically the number of cores. |
verbose |
Show computation status and estimated runtime. |
... |
Ignored. |
Details
Forests trained by literanger retain information about the in-bag responses
in each terminal node, thus facilitating efficient predictions within a
variation on multiple imputation proposed by Doove et al (2014). This type of
prediction can be selected by setting prediction_type="inbag"
, or the usual
prediction for classification and regression forests, the most-frequent-value
and mean of in bag samples respectively, is given by setting
prediction_type="bagged"
.
A list is returned. The values
item contains the predicted classes or
values (classification and regression forests, respectively). Factor levels
are returned as factors with the levels as per the original training data.
Compared to the original package ranger, literanger excludes certain features:
Probability, survival, and quantile regression forests.
Support for class gwaa.data.
Standard error estimation.
Value
Object of class literanger_prediction
with elements:
values
Predicted (drawn) classes/value for classification and regression.
tree_type
Number of trees.
seed
The seed supplied to the C++ library.
Author(s)
Stephen Wade stephematician@gmail.com, Marvin N Wright (original ranger package)
References
Doove, L. L., Van Buuren, S., & Dusseldorp, E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92-104. doi:10.1016/j.csda.2013.10.025.
Stekhoven, D.J. and Buehlmann, P. (2012). MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi:10.1093/bioinformatics/btr597.
Wright, M. N., & Ziegler, A. (2017a). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77, 1-17. doi:10.18637/jss.v077.i01.
See Also
Examples
## Classification forest
train_idx <- sample(nrow(iris), 2/3 * nrow(iris))
iris_train <- iris[ train_idx, ]
iris_test <- iris[-train_idx, ]
rf_iris <- train(data=iris_train, response_name="Species")
pred_iris_bagged <- predict(rf_iris, newdata=iris_test,
prediction_type="bagged")
pred_iris_inbag <- predict(rf_iris, newdata=iris_test,
prediction_type="inbag")
# compare bagged vs actual test values
table(iris_test$Species, pred_iris_bagged$values)
# compare bagged prediction vs in-bag draw
table(pred_iris_bagged$values, pred_iris_inbag$values)