featureContribTree {tree.interpreter} | R Documentation |
Feature Contribution
Description
Contribution of each feature to the prediction.
Usage
featureContribTree(tidy.RF, tree, X)
featureContrib(tidy.RF, X)
Arguments
tidy.RF |
A tidy random forest. The random forest to make predictions with. |
tree |
An integer. The index of the tree to look at. |
X |
A data frame. Features of samples to be predicted. |
Details
Recall that each node in a decision tree has a prediction associated with it. For regression trees, it's the average response in that node, whereas in classification trees, it's the frequency of each response class, or the most frequent response class in that node.
For a tree in the forest, the contribution of each feature to the prediction
of a sample is the sum of differences between the predictions of nodes which
split on the feature and those of their children, i.e. the sum of changes in
node prediction caused by spliting on the feature. This is the calculated by
featureContribTree
.
For a forest, the contribution of each feature to the prediction if a sample
is the average contribution across all trees in the forest. This is because
the prediction of a forest is the average of the predictions of its trees.
This is calculated by featureContrib
.
Together with trainsetBias(Tree)
, they can decompose the prediction
by feature importance:
prediction(MODEL, X) =
trainsetBias(MODEL) +
featureContrib_1(MODEL, X) + ... + featureContrib_p(MODEL, X),
where MODEL can be either a tree or a forest.
Value
A cube (3D array). The content depends on the type of the response.
Regression: A P-by-1-by-N array, where P is the number of features in
X
, and N the number of samples inX
. The pth row of the nth slice stands for the contribution of feature p to the prediction for response n.Classification: A P-by-D-by-N array, where P is the number of features in
X
, D is the number of response classes, and N is the number of samples inX
. The pth row of the nth slice stands for the contribution of feature p to the prediction of each response class for response n.
Functions
-
featureContribTree
: Feature contribution to prediction within a single tree -
featureContrib
: Feature contribution to prediction within the whole forest
References
Interpreting random forests http://blog.datadive.net/interpreting-random-forests/
Random forest interpretation with scikit-learn http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/
See Also
Examples
library(ranger)
test.id <- 50 * seq(3)
rfobj <- ranger(Species ~ ., iris[-test.id, ], keep.inbag=TRUE)
tidy.RF <- tidyRF(rfobj, iris[-test.id, -5], iris[-test.id, 5])
featureContribTree(tidy.RF, 1, iris[test.id, -5])
featureContrib(tidy.RF, iris[test.id, -5])