decision_tree {LearnSL} | R Documentation |
Decision Tree
Description
This function creates a decision tree based of an example dataset, calculating the best classifier possible in each step. Only creates perfect divisions, this means, if the rule doesn't create a classified group, it is not considered. It is specifically designed for categorical values. Continues values are not recommended as they will be treated as categorical ones.
Usage
decision_tree(
data,
classy,
m,
method = "entropy",
details = FALSE,
waiting = TRUE
)
Arguments
data |
A data frame with already classified observations. Each column represents a parameter of the value. Each row is a different observation. The column names in the parameter "data" must not contain the sequence of characters " or ". As this is supposed to be a binary decision rules generator and not a binary decision tree generator, no tree structures are used, except for the information gain formulas. |
classy |
Name of the column we want the data to be classified by. the set of rules obtained will be calculated according to this. |
m |
Maximum numbers of child nodes each node can have. |
method |
The definition of Gain. It must be one of
|
details |
Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code |
waiting |
If TRUE while |
Details
If data
is not perfectly classifiable, the code will not finish.
Available information gain methods are:
- Entropy
The formula to calculate the entropy works as follows:
p_{i} = -\sum{f_{i} p_{i} \cdot \log2 p_{i}}
- Gini
The formula to calculate gini works as follows:
p_{i} = 1 -\sum{f_{i} p_{i}^{2}}
- Error
The formula to calculate error works as follows:
p_{i} = 1 -\max{(f_{i} p_{i}})
Once the impurity is calculated, the information gain is calculated as follows:
IG = I_{father} - \sum{\frac{count(sonvalues)}{count(fathervalues)} \cdot I_{son}}
Value
Structure of the tree. List with a list per tree level. Each of these contains a list per level node, each of these contains a list with the node's filtered data, the node's id, the father's node id, the height that node is at, the variable it filters by, the value that variable is filtered by and the information gain of the division
Author(s)
VĂctor Amador Padilla, victor.amador@edu.uah.es
Examples
# example code
decision_tree(db3, "VehicleType", 5, "entropy", details = TRUE, waiting = FALSE)
decision_tree(db2, "VehicleType", 4, "gini")