R: Make predictions on the test observations based on a rktree...

pred.treeRK {forestRK}

R Documentation

Make predictions on the test observations based on a rktree model

Description

Makes predictions on the observations in the test dataset based on the rktree model constructed from the training dataset.

Please be aware that, at the end of the pred.treeRK function, the test data points in prediction.df are re-ordered by the increasing original index number (the original rownames) of those test observations. So if you shuffled the data before seperating them into a training and a test set, the order of the data points in which they are presented under the data frame prediction.df may not be same as the shuffled order in your original test set.

Users of this function may be interested in identifying the original name of the numericized predicted class type shown in the last column of data frame prediction.df. This can easily be done by extracting the attribute y.factor.levels from the y.organizer object. For example, if the data frame prediction.df indicates that the predicted class type of the 1st test observation is "2", that means the actual name of the predicted class type for that 1st test observation is indicated as the 2nd element of the vector y.organizer.object$y.factor.levels that we can obtain during the data cleaning phase.

The pred.treeRK function makes a use of the list of hierarchical flags generated by the construct.treeRK function; the function uses the list of hierarchical flag as a guide to how it should split the test set to make predictions. The function pred.treeRK itself actually generates a list of hierarchical flag of its own as it splits the test set, and at the end of the function pred.treeRK tries to match the list of hierarchical flag it generated with the list of hierarchical flag from the construct.treeRK function. If the two flags match exactly, then it is a good sign since this would imply that the splitting on the test set was done in the manner consistent with how the training set was split when the rkTree in question was built. If there is any difference in the two flags, however, this is not a good sign since it would signal that the splitting on the test set has done in a different manner than how the splitting was done on the training set; if the mismatch occurs, the pred.treeRK function will stop and throw an error. For more information about the hierarchical flags of a rkTree, please see the construct.treeRK section of this documentation.

Usage

 pred.treeRK(X = data.frame(), rktree = construct.treeRK())

Arguments

`X`	a numericized data frame of covariates of the test observations or the observations that we want to make predictions for (obtained via `x.organizer()`). `X` should contain no `NA` or `NaN`'s.
`rktree`	a `construct.treeRK` object.

Value

A list containing the following items:

prediction.df

a data frame of test observations. If prediction.df has n columns, the first n-1 columns will contain the numericized covariates of the test observations, and the very last n-th column will contain the predicted numericized class type for each of those test observations. Note that, at the end of the pred.treeRK function, the test data points in prediction.df are re-ordered by theincreasing original observation index number.

flag.pred

the hierarchical flag of splits performed on the test set by applying the rktree model in question.

Author(s)

Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca

Examples

  ## example: iris dataset
  ## load the forestRK package
  library(forestRK)

  ## numericize the data
  x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
  x.test <- x.organizer(iris[,1:4], encoding = "num")[c(26:50,76:100,126:150),]
  y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new

  ## Construct a tree
  # min.num.obs.end.node.tree is set to 5 by default;
  # entropy is set to TRUE by default
  tree.entropy <- construct.treeRK(x.train, y.train)
  tree.gini <- construct.treeRK(x.train, y.train,
                                min.num.obs.end.node.tree = 6, entropy = FALSE)

  ## Make predictions on the test set based on the constructed rktree model
  # last column of prediction.df stores predicted class on the test observations
  # based on a given rktree
  prediction.df <- pred.treeRK(X = x.test, tree.entropy)$prediction.df
  flag.pred <- pred.treeRK(X = x.test, tree.entropy)$flag.pred

[Package forestRK version 0.0-5 Index]