best.cut.node {ODRF} | R Documentation |
find best splitting variable and node
Description
A function to select the splitting variables and nodes using one of three criteria.
Usage
best.cut.node(
X,
y,
split,
lambda = "log",
weights = 1,
MinLeaf = 10,
numLabels = ifelse(split == "mse", 0, length(unique(y)))
)
Arguments
X |
An n by d numeric matrix (preferable) or data frame. |
y |
A response vector of length n. |
split |
One of three criteria, 'gini': gini impurity index (classification), 'entropy': information gain (classification) or 'mse': mean square error (regression). |
lambda |
The argument of |
weights |
A vector of values which weigh the samples when considering a split. |
MinLeaf |
Minimal node size (Default 10). |
numLabels |
The number of categories. |
Value
A list which contains:
BestCutVar: The best split variable.
BestCutVal: The best split points for the best split variable.
BestIndex: Each variable corresponds to maximum decrease in gini impurity index, information gain, and mean square error.
Examples
### Find the best split variable ###
data(iris)
X <- as.matrix(iris[, 1:4])
y <- iris[[5]]
bestcut <- best.cut.node(X, y, split = "gini")
print(bestcut)