predict.randomUniformForest {randomUniformForest} | R Documentation |
Predict method for random Uniform Forests objects
Description
Prediction of test data with random Uniform Forests. Many options are allowed, the default one rendering exactly the same type of variable than the one of training labels.
Usage
## S3 method for class 'randomUniformForest'
predict(object, X,
type = c("response", "prob", "votes", "confInt",
"ranking", "quantile", "truemajority", "all"),
classcutoff = c(0,0),
conf = 0.95,
whichQuantile = NULL,
rankingIDs = NULL,
threads = "auto",
parallelpackage = "doParallel",
...)
Arguments
object |
an object of class randomUniformForest, as one created by the randomUniformForest( ) function. |
X |
a data frame or matrix containing new data (without response values). |
type |
one of response, prob, votes, prediction intervals, ranking, quantile, true majority or raw outputs, indicating respectively the type of output: 'predicted values', 'matrix of class probabilities', 'matrix of vote counts', 'prediction interval for each response', 'ranked responses' in case of recommendation scheme (note that it is currently experimental and it, first, requires classification modelling), 'quantile regression', 'predicted values based on nodes votes instead of tree votes', or lastly, 'raw outputs' of a random uniform forest. Note that "confInt" and "quantile" use an estimate of the conditional expectation for each tree parameter (the randomness), which contains many duplicates, due to bootstrap and the nature of ensemble learning, to produce estimates. These ones are usually loose since the forest replicates the distribution of Y (knowing X) to take decisions while one does not need all the X values to have one prediction. A more accurate option is to use the dedicated |
classcutoff |
see |
conf |
if type == "confInt", value of 'conf' (greater than 0 and lesser than 1) is the desired level of confidence for any prediction interval. |
whichQuantile |
if type == "quantile", value of 'whichQuantile' is the desired quantile (greater than 0 and lesser than 1). |
rankingIDs |
experimental. If 'type = "ranking"', vector of ID if one wants to rank responses instead of classify them. Ranking usually involves many same users (or objects, or items) whose choices have to be ranked, e.g., by a recommendation engine. For example, if 'user1' is present four times in test sample, its ID gives to the model a way to identify him (or it) and to order response values, beginning by the most relevant. Then, for many users, the process is repeated giving, finally, predictions from most to least relevant, taking into account all users. random Uniform Forests approach currently uses class probabilities to order predictions values. Note that optimizing AUC is one of the methods leading to good ranking methods if choices are binary, and NDCG is a way to assess model. |
threads |
see |
parallelpackage |
see |
... |
not used currently. |
Value
response |
predicted values. Default option that returns values in the same way than original training responses. |
prob |
for classification only. Matrix of class probabilities. |
votes |
matrix of vote counts. Each row is an observation and each column is tree output. |
confInt |
for regression only. Matrix where each row is an observation and each column one of the prediction interval bounds. |
quantile |
for regression only. Vector of predicted quantiles for 'conf' (value of the option) level of confidence. |
ranking |
a matrix or data frame. Description will be updated soon. |
truemajority |
predicted values, using raw outputs of trees (not majority vote). This option makes sense if one set 'nodesize' option greater than 1. Hence, aggregation is participative (at the leaf level) and not representative (at the tree level). |
all |
raw outputs of the model. Not useful, unless further computation is needed, for example in case of Post-processing. |
Author(s)
Saip Ciss saip.ciss@wanadoo.fr
See Also
postProcessingVotes
, bCI
, model.stats
, generic.cv
Examples
## same as randomForest example
# data(iris)
# set.seed(111)
# ind <- sample(2, nrow(iris), replace = TRUE, prob = c(0.8, 0.2))
# iris.ruf <- randomUniformForest(Species ~ ., data = iris[ind == 1,], OOB = FALSE,
# importance = FALSE, threads = 1)
# iris.pred <- predict(iris.ruf, iris[ind == 2,])
# table(observed = iris[ind == 2, "Species"], predicted = iris.pred)
## get all votes : note that aliases of classes are used internally and, for intermediate
## results, are not converted to their true values
# iris.all.votes <- predict(iris.ruf, iris[ind == 2,], type = "votes")
## get class probabilities
# iris.class.prob <- predict(iris.ruf, iris[ind == 2,], type = "prob")
# iris.class.prob