R: find best splitting variable and node

best.cut.node {ODRF}

R Documentation

find best splitting variable and node

Description

A function to select the splitting variables and nodes using one of three criteria.

Usage

best.cut.node(
  X,
  y,
  split,
  lambda = "log",
  weights = 1,
  MinLeaf = 10,
  numLabels = ifelse(split == "mse", 0, length(unique(y)))
)

Arguments

`X`	An n by d numeric matrix (preferable) or data frame.
`y`	A response vector of length n.
`split`	One of three criteria, 'gini': gini impurity index (classification), 'entropy': information gain (classification) or 'mse': mean square error (regression).
`lambda`	The argument of `split` is used to determine the penalty level of the partition criterion. Three options are provided including, `lambda=0`: no penalty; `lambda=2`: AIC penalty; `lambda='log'` (Default): BIC penalty. In Addition, lambda can be any value from 0 to n (training set size).
`weights`	A vector of values which weigh the samples when considering a split.
`MinLeaf`	Minimal node size (Default 10).
`numLabels`	The number of categories.

Value

A list which contains:

BestCutVar: The best split variable.
BestCutVal: The best split points for the best split variable.
BestIndex: Each variable corresponds to maximum decrease in gini impurity index, information gain, and mean square error.

Examples

### Find the best split variable ###
data(iris)
X <- as.matrix(iris[, 1:4])
y <- iris[[5]]
bestcut <- best.cut.node(X, y, split = "gini")
print(bestcut)

[Package ODRF version 0.0.4 Index]