pred.forestRK {forestRK} | R Documentation |
Make predictions on the test data based on the forestRK model constructed from the training data
Description
Makes predictions on the test dataset based on the forestRK
model
constructed from the training dataset.
Please be aware that, the test data points in test.prediction.df.list
, pred.for.obs.forest.rk
, and num.pred.for.obs.forest.rk
are
re-ordered by the increasing original index number (the original rownames) of
those test observations. So if you shuffled the data before seperating them
into a training and a test set, the order of the data points in which they are
presented under the attribute test.prediction.df.list
,
pred.for.obs.forest.rk
, and num.pred.for.obs.forest.rk
may not be
same as the shuffled order of your original test set.
Calling of this function internally loads the package rapportools
; this
is to allow the use of is.boolean
method to check one of the stopping
criteria in the beginning.
The basic mechanism behind pred.forestRK
function is the following:
When the function is called, it calls forestRK
function after passing
the user-specified training data as an argument, in order to first generate the
forestRK
object. After that, the function uses pred.treeRK
function to make predictions on the test observations based on each individual
tree in the forestRK
object. Once the individual prediction from each
tree are obtained for all of the test observations, the function stores those
individual predictions under a big dataframe. Once that data frame is complete,
then the function collapses the results by the rule of the majority votes.
For example, for the m-th observation from the test set, if the most frequently
predicted class type for that m-th test observation by all of the rkTrees in
the forest is class type 'A', then by the rule of the majority votes, the
pred.forestRK
function will assign class 'A' as the predicted class type
for that m-th test observation based on the forestRK
model.
Usage
pred.forestRK(x.test = data.frame(), x.training = data.frame(),
y.training = c(), y.factor.levels,
min.num.obs.end.node.tree = 5,
nbags, samp.size, entropy = TRUE)
Arguments
x.test |
a numericized data frame of covariates of the data points on which we want
to make our predictions (typically the test observations); |
x.training |
a numericized data frame of covariates of data points from which we build our
|
y.training |
a vector that stores numericized class types of the training
data points; |
min.num.obs.end.node.tree |
the minimum number of observations that we want each end node of
our |
nbags |
the number of bootstrap samples that we want to generate to form a
|
samp.size |
the number of data points that we want each of our bootstrap sample to contain. |
y.factor.levels |
a vector of original names of all class types that the user considers in his
or her study (can be obtained via |
entropy |
|
Value
A list containing the following items:
x.test |
the original test dataset that we used to make predictions. |
df.of.predictions.for.all.observations |
a data frame storing predicted class types for all test observations from
each tree in the forest; each row of this data frame pertains to individual
test observation, and each column pertain to a specific tree from the
|
forest.rk |
a |
test.prediction.df.list |
a list of data frames storing the |
pred.for.obs.forest.rk |
a vector that stores the actual predicted class labels of the
test observations instead of their numericized (integer) class types.
Note that the test data points in |
num.pred.for.obs.forest.rk |
the numericized version of |
Author(s)
Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca
See Also
Examples
## example: iris dataset
## load the forestRK package
library(forestRK)
## numericize the data
x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
x.test <- x.organizer(iris[,1:4], encoding = "num")[c(26:50,76:100,126:150),]
y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new
y.factor.levels <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.factor.levels
## make prediction from a random forest RK model
## typically the nbags and samp.size has to be much larger than 30 and 50
pred.forest.rk <- pred.forestRK(x.test = x.test, x.training = x.train,
y.training = y.train,
y.factor.levels,
min.num.obs.end.node.tree = 6,
nbags = 30, samp.size = 50, entropy = FALSE)
pred.forest.rk$test.prediction.df.list[[10]]
pred.forest.rk$pred.for.obs.forest.rk # etc....