pred.treeRK {forestRK} | R Documentation |
Make predictions on the test observations based on a rktree model
Description
Makes predictions on the observations in the test dataset based on the
rktree
model constructed from the training dataset.
Please be aware that, at the end of the pred.treeRK
function, the test
data points in prediction.df
are re-ordered by the increasing original
index number (the original rownames) of those test observations. So if you
shuffled the data before seperating them into a training and a test set,
the order of the data points in which they are presented under the data frame
prediction.df
may not be same as the shuffled order in your original
test set.
Users of this function may be interested in identifying the original name of
the numericized predicted class type shown in the last column of data frame
prediction.df
. This can easily be done by extracting the attribute
y.factor.levels
from the y.organizer
object. For example, if the
data frame prediction.df
indicates that the predicted class type of the
1st test observation is "2", that means the actual name of the predicted
class type for that 1st test observation is indicated as the 2nd element of the
vector y.organizer.object$y.factor.levels
that we can obtain during
the data cleaning phase.
The pred.treeRK
function makes a use of the list of hierarchical flags
generated by the construct.treeRK
function; the function uses the list
of hierarchical flag as a guide to how it should split the test set to make
predictions. The function pred.treeRK
itself actually generates a list
of hierarchical flag of its own as it splits the test set, and at the end of
the function pred.treeRK
tries to match the list of hierarchical flag it
generated with the list of hierarchical flag from the construct.treeRK
function. If the two flags match exactly, then it is a good sign since this
would imply that the splitting on the test set was done in the manner consistent
with how the training set was split when the rkTree in question was built.
If there is any difference in the two flags, however, this is not a good sign
since it would signal that the splitting on the test set has done in a different
manner than how the splitting was done on the training set; if the mismatch
occurs, the pred.treeRK
function will stop and throw an error. For more
information about the hierarchical flags of a rkTree
, please see the
construct.treeRK
section of this documentation.
Usage
pred.treeRK(X = data.frame(), rktree = construct.treeRK())
Arguments
X |
a numericized data frame of covariates of the test observations
or the observations that we want to make predictions for (obtained via
|
rktree |
a |
Value
A list containing the following items:
prediction.df |
a data frame of test observations. If |
flag.pred |
the hierarchical flag of splits performed on the test set by applying the
|
Author(s)
Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca
See Also
Examples
## example: iris dataset
## load the forestRK package
library(forestRK)
## numericize the data
x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
x.test <- x.organizer(iris[,1:4], encoding = "num")[c(26:50,76:100,126:150),]
y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new
## Construct a tree
# min.num.obs.end.node.tree is set to 5 by default;
# entropy is set to TRUE by default
tree.entropy <- construct.treeRK(x.train, y.train)
tree.gini <- construct.treeRK(x.train, y.train,
min.num.obs.end.node.tree = 6, entropy = FALSE)
## Make predictions on the test set based on the constructed rktree model
# last column of prediction.df stores predicted class on the test observations
# based on a given rktree
prediction.df <- pred.treeRK(X = x.test, tree.entropy)$prediction.df
flag.pred <- pred.treeRK(X = x.test, tree.entropy)$flag.pred