R: Calculates Entropy or Gini Index of a node after a given...

criteria.after.split.calculator {forestRK}

R Documentation

Calculates Entropy or Gini Index of a node after a given split

Description

Calculates Entropy or Gini Index of a particular node after a particular split; this function is called within construct.treeRK function.

The argument split.record is a kidids_split object from the package partykit; the method kidids_split splits the data according to the criteria specified by an user ahead of time, and returns a vector storing the index of the split group (group "1" or "2") that each observation from the original data in question belongs to after the split has occurred.

For more information about the function, please see the partykit documentation.

Usage

 criteria.after.split.calculator(x.node = data.frame(), y.new.node = c(),
                                 split.record = kidids_split(),
                                 entropy = TRUE)

Arguments

`x.node`	numericized data frame of covariates (obtained via `x.organizer()`) from a particular node that is to be split; `x.node` should contain no `NA` or `NaN`'s.
`y.new.node`	numericized class type of each observation from a particular node that is to be split; `y.new.node` should contain no `NA` or`NaN`'s.
`split.record`	output of the `kidids_split` function from the `partykit` package that describes a particular split.
`entropy`	`TRUE` if Entropy is used as the splitting criteria; `FALSE` if Gini Index is used instead. Default is set to `TRUE`.

Value

The value of Entropy or Gini Index of a particular node after a particular split.

Author(s)

Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca

Examples

  ## example: iris dataset
  library(forestRK) # load the package forestRK
  library(partykit)

  # covariates of training data set
  x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
  # numericized class types of observations of training dataset
  y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new
  ## criteria.after.split.calculator() example in the implementation
  ## of the forestRK algorithm

  ent.status <- TRUE

  # number.of.columns.of.x.node
  # = total number of covariates that we consider
  number.of.columns.of.x.node <- dim(x.train)[2]
  # m.try = the randomly chosen number of covariates that we consider
  # at the time of split
  m.try <- sample(1:(number.of.columns.of.x.node),1)
  ## sample m.try number of covariates from the list of all covariates
  K <- sample(1:(number.of.columns.of.x.node), m.try)

  # split the data
  # (the choice of the type of split used here is only arbitrary)
  # for more information about kidids_split,
  # please refer to the documentation for the package 'partykit'
  sp <- partysplit(varid=K[1], breaks = x.train[1,K[1]], index = NULL,
                   right = TRUE, prob = NULL, info = NULL)
  split.record <- kidids_split(sp, data=x.train)

  # implement critera.after.split function based on kidids_split object
  criteria.after.split <- criteria.after.split.calculator(x.train,
                                    y.train, split.record, ent.status)
  criteria.after.split

[Package forestRK version 0.0-5 Index]