R: Literanger prediction

predict.literanger {literanger}

R Documentation

Literanger prediction

Description

'literanger' provides different types of prediction that may be used in multiple imputation algorithms with random forests. The usual prediction is the 'bagged' prediction, the most frequent value (or the mean) of the in-bag samples in a terminal node. Doove et al (2014) propose a prediction that better matches the predictive distribution as needed for multiple imputation; take a random draw from the observations in the terminal node from a randomly drawn tree in the forest for each predicted value needed. Alternatively, the usual most-frequent-value or mean of the in-bag responses can be used as in missForest (Stekhoven et al, 2014) or miceRanger https://cran.r-project.org/package=miceRanger and missRanger https://cran.r-project.org/package=missRanger.

Usage

## S3 method for class 'literanger'
predict(
  object,
  newdata = NULL,
  prediction_type = c("bagged", "inbag", "nodes"),
  seed = 1L + sample.int(n = .Machine$integer.max - 1L, size = 1),
  n_thread = 0,
  verbose = FALSE,
  ...
)

Arguments

`object`	A trained random forest `literanger` object.
`newdata`	Data of class `data.frame`, `matrix`, or `dgCMatrix` (Matrix), for the latter two; must have column names; all predictors named in `object$predictor_names` must be present.
`prediction_type`	Name of the prediction algorithm; "bagged" is the most-frequent value among in-bag samples for classification, or the mean of in-bag responses for regression; "inbag" predicts by drawing one in-bag response from a random tree for each row; "nodes" (currently unsupported) returns the node keys (ids) of the terminal node from every tree for each row.
`seed`	Random seed, an integer between 1 and `.Machine$integer.max`. Default generates the seed from `R`, set to `0` to ignore the `R` seed and use a C++ `std::random_device`.
`n_thread`	Number of threads. Default is determined by system, typically the number of cores.
`verbose`	Show computation status and estimated runtime.
`...`	Ignored.

Details

Forests trained by literanger retain information about the in-bag responses in each terminal node, thus facilitating efficient predictions within a variation on multiple imputation proposed by Doove et al (2014). This type of prediction can be selected by setting prediction_type="inbag", or the usual prediction for classification and regression forests, the most-frequent-value and mean of in bag samples respectively, is given by setting prediction_type="bagged".

A list is returned. The values item contains the predicted classes or values (classification and regression forests, respectively). Factor levels are returned as factors with the levels as per the original training data.

Compared to the original package ranger, literanger excludes certain features:

Probability, survival, and quantile regression forests.
Support for class gwaa.data.
Standard error estimation.

Value

Object of class literanger_prediction with elements:

values: Predicted (drawn) classes/value for classification and regression.
tree_type: Number of trees.
seed: The seed supplied to the C++ library.

Author(s)

Stephen Wade stephematician@gmail.com, Marvin N Wright (original ranger package)

References

Doove, L. L., Van Buuren, S., & Dusseldorp, E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92-104. doi:10.1016/j.csda.2013.10.025.
Stekhoven, D.J. and Buehlmann, P. (2012). MissForest–non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi:10.1093/bioinformatics/btr597.
Wright, M. N., & Ziegler, A. (2017a). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77, 1-17. doi:10.18637/jss.v077.i01.

Examples

## Classification forest
train_idx <- sample(nrow(iris), 2/3 * nrow(iris))
iris_train <- iris[ train_idx, ]
iris_test  <- iris[-train_idx, ]
rf_iris <- train(data=iris_train, response_name="Species")
pred_iris_bagged <- predict(rf_iris, newdata=iris_test,
                            prediction_type="bagged")
pred_iris_inbag  <- predict(rf_iris, newdata=iris_test,
                            prediction_type="inbag")
# compare bagged vs actual test values
table(iris_test$Species, pred_iris_bagged$values)
# compare bagged prediction vs in-bag draw
table(pred_iris_bagged$values, pred_iris_inbag$values)

[Package literanger version 0.0.2 Index]