forestRK {forestRK}R Documentation

Builds up a random forest RK model based on the given (training) dataset

Description

Builds up a random forest RK model onto the given (training) dataset.

The functions bstrap and construct.treeRK are used inside this function. Once the call for bstrap generates bootstrap samples of the training dataset, then the function construct.treeRK is called in order to build a tree on each of those bootstrap dataset, to form a bigger forest.

Calling of this function internally loads the package rapportools; this is to allow the use of is.boolean method to check one of the stopping criteria.

Usage

 forestRK(X = data.frame(), Y.new = c(),
          min.num.obs.end.node.tree = 5, nbags, samp.size, entropy = TRUE)

Arguments

X

a numericized data frame storing covariates of each observation contained in the given (training) dataset (obtained via x.organizer()); X should contain no NA or NaN's.

Y.new

a vector storing the numericized class types of each observation contained in the given (training) dataset X; Y.new should contain no NA or NaN's.

min.num.obs.end.node.tree

the minimum number of observations that we want each end node of our rktree to contain. Default is set to 5.

nbags

number of bootstrap samples that we want to generate to generate a forest.

samp.size

number of observations that we want each of our bootstrap samples to contain.

entropy

TRUE if we use Entropy as the splitting criteria; FALSE if we use the Gini Index for the splitting criteria. Default is set to TRUE.

Value

A list containing the following items:

X

The original (training) dataset that was used to construct the random forest RK model.

forest.rk.tree.list

A list of trees (construct.treeRK objects) contained in the forestRK model.

bootsamp.list

A list containing data frames of bootstrap samples that were generated from the given (training) dataset X.

ent.status

The value of the parameter entropy.

Author(s)

Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca

See Also

bstrap construct.treeRK

Examples

  ## example: iris dataset
  ## load the forestRK package
  library(forestRK)

  # covariates of training data set
  x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
  y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new

  # Implement forestRK function
  # min.num.obs.end.node.tree is set to 5 by default;
  # entropy is set to TRUE by default
  # normally nbags and samp.size has to be much larger than 30 and 50
  forestRK.1 <- forestRK(x.train, y.train, nbags = 30, samp.size = 50)

  # extract the first tree in the forestRK.1 model
  forestRK.1$forest.rk.tree.list[[1]]

[Package forestRK version 0.0-5 Index]