decision_tree {LearnSL}R Documentation

Decision Tree

Description

This function creates a decision tree based of an example dataset, calculating the best classifier possible in each step. Only creates perfect divisions, this means, if the rule doesn't create a classified group, it is not considered. It is specifically designed for categorical values. Continues values are not recommended as they will be treated as categorical ones.

Usage

decision_tree(
  data,
  classy,
  m,
  method = "entropy",
  details = FALSE,
  waiting = TRUE
)

Arguments

data

A data frame with already classified observations. Each column represents a parameter of the value. Each row is a different observation. The column names in the parameter "data" must not contain the sequence of characters " or ". As this is supposed to be a binary decision rules generator and not a binary decision tree generator, no tree structures are used, except for the information gain formulas.

classy

Name of the column we want the data to be classified by. the set of rules obtained will be calculated according to this.

m

Maximum numbers of child nodes each node can have.

method

The definition of Gain. It must be one of "Entropy", "Gini"or "Error".

details

Boolean value. If it is set to "TRUE" multiple clarifications and explanations are printed along the code

waiting

If TRUE while details = TRUE. The code will stop in each "block" of code and wait for the user to press "enter" to continue.

Details

If data is not perfectly classifiable, the code will not finish.

Available information gain methods are:

Entropy

The formula to calculate the entropy works as follows:p_{i} = -\sum{f_{i} p_{i} \cdot \log2 p_{i}}

Gini

The formula to calculate gini works as follows:p_{i} = 1 -\sum{f_{i} p_{i}^{2}}

Error

The formula to calculate error works as follows:p_{i} = 1 -\max{(f_{i} p_{i}})

Once the impurity is calculated, the information gain is calculated as follows:

IG = I_{father} - \sum{\frac{count(sonvalues)}{count(fathervalues)} \cdot I_{son}}

Value

Structure of the tree. List with a list per tree level. Each of these contains a list per level node, each of these contains a list with the node's filtered data, the node's id, the father's node id, the height that node is at, the variable it filters by, the value that variable is filtered by and the information gain of the division

Author(s)

VĂ­ctor Amador Padilla, victor.amador@edu.uah.es

Examples

# example code
decision_tree(db3, "VehicleType", 5, "entropy", details = TRUE, waiting = FALSE)
decision_tree(db2, "VehicleType", 4, "gini")


[Package LearnSL version 1.0.0 Index]