best.cut.node {ODRF}R Documentation

find best splitting variable and node

Description

A function to select the splitting variables and nodes using one of three criteria.

Usage

best.cut.node(
  X,
  y,
  split,
  lambda = "log",
  weights = 1,
  MinLeaf = 10,
  numLabels = ifelse(split == "mse", 0, length(unique(y)))
)

Arguments

X

An n by d numeric matrix (preferable) or data frame.

y

A response vector of length n.

split

One of three criteria, 'gini': gini impurity index (classification), 'entropy': information gain (classification) or 'mse': mean square error (regression).

lambda

The argument of split is used to determine the penalty level of the partition criterion. Three options are provided including, lambda=0: no penalty; lambda=2: AIC penalty; lambda='log' (Default): BIC penalty. In Addition, lambda can be any value from 0 to n (training set size).

weights

A vector of values which weigh the samples when considering a split.

MinLeaf

Minimal node size (Default 10).

numLabels

The number of categories.

Value

A list which contains:

Examples

### Find the best split variable ###
data(iris)
X <- as.matrix(iris[, 1:4])
y <- iris[[5]]
bestcut <- best.cut.node(X, y, split = "gini")
print(bestcut)


[Package ODRF version 0.0.4 Index]