R: Mean Decrease in Impurity

MDITree {tree.interpreter}

R Documentation

Mean Decrease in Impurity

Description

Calculate the MDI feature importance measure.

Usage

MDITree(tidy.RF, tree, trainX, trainY)

MDI(tidy.RF, trainX, trainY)

Arguments

`tidy.RF`	A tidy random forest. The random forest to calculate MDI from.
`tree`	An integer. The index of the tree to look at.
`trainX`	A data frame. Train set features, such that the `T`th tree is trained with `X[tidy.RF$inbag.counts[[T]], ]`.
`trainY`	A data frame. Train set responses, such that the `T`th tree is trained with `Y[tidy.RF$inbag.counts[[T]], ]`.

Details

MDI stands for Mean Decrease in Impurity. It is a widely adopted measure of feature importance in random forests. In this package, we calculate MDI with a new analytical expression derived by Li et al. (See references)

See vignette('MDI', package='tree.interpreter') for more context.

Value

A matrix. The content depends on the type of the response.

Regression: A P-by-1 matrix, where P is the number of features in X. The pth row contains the MDI of feature p.
Classification: A P-by-D matrix, where P is the number of features in X and D is the number of response classes. The dth column of the pth row contains the MDI of feature p to class d. You can get the MDI of each feature by calling rowSums on the result.

Functions

MDITree: Mean decrease in impurity within a single tree
MDI: Mean decrease in impurity within the whole forest

References

A Debiased MDI Feature Importance Measure for Random Forests https://arxiv.org/abs/1906.10845

Examples

library(ranger)
rfobj <- ranger(Species ~ ., iris, keep.inbag=TRUE)
tidy.RF <- tidyRF(rfobj, iris[, -5], iris[, 5])
MDITree(tidy.RF, 1, iris[, -5], iris[, 5])
MDI(tidy.RF, iris[, -5], iris[, 5])

[Package tree.interpreter version 0.1.1 Index]