cutoff.node.and.covariate.index.finder {forestRK} | R Documentation |
Identifies optimal cutoff point of an impure node for splitting after applying
the rk
(Random K) algorithm.
Description
Identifies optimal cutoff point of an impure dataset for splitting after
applying the rk
(Random K) algoritm, in terms of Entropy or Gini Index.
To give an example, if the function gives cutoff.value
of 2.5,
covariate.ind
of 4, and cutoff.node
of 23, this would inform the
user that if a split is to be performed on the particular node that the user is
considering, the split should occur on the 4th covariate (the actual name of
this covariate would be the name of the 4th column from the original dataset),
at the value of 2.5 (left child node in this case would be the group of
observations that have their 4th covariate value less than or equal to 2.5, and
for the right child node would be the group of observations that have their 4th
covariate value greater than 2.5), and that this splitting point corresponds to
the 23rd observation point of the node.
This function internally loads the packages partykit
and
rapportools
; the package partykit
is internally loaded to
generate the object split.record.optimal
, and the package
rapportools
is loaded to allow the validation of one of the
stopping criteria that uses is.boolean
method.
This function is ran internally in the construct.treeRK
function.
Usage
cutoff.node.and.covariate.index.finder(x.node = data.frame(),
y.new.node = c(), entropy = TRUE)
Arguments
x.node |
a numericized data frame of covariates of the observations from a particular
node prior to the split (can be obtained after applying |
y.new.node |
a vector storing numericized class type of the observations from a particular
node before the split (can be obtained after applying |
entropy |
|
Value
A list containing the following items:
cutoff.value |
the value at which the optimal split should take place. |
cutoff.node |
the index of the observation (observation number) at which optimal split should occur. |
covariate.ind |
numeric index of the covariate at which the optimal split should occur. |
split.record.optimal |
the |
Author(s)
Hyunjin Cho, h56cho@uwaterloo.ca Rebecca Su, y57su@uwaterloo.ca
See Also
Examples
## example: iris dataset
## load the forestRK package
library(forestRK)
## numericize the data
x.train <- x.organizer(iris[,1:4], encoding = "num")[c(1:25,51:75,101:125),]
y.train <- y.organizer(iris[c(1:25,51:75,101:125),5])$y.new
# implementation of cutoff.node.and.covariate.index.finder()
res <- cutoff.node.and.covariate.index.finder(x.train, y.train,
entropy=FALSE)
res$cutoff.value
res$cutoff.node
res$covariate.ind
res$split.record.optimal