R: Explanation groves

xgrove {xgrove}

R Documentation

Explanation groves

Description

Compute surrogate groves to explain predictive machine learning model and analyze complexity vs. explanatory power.

Usage

xgrove(
  model,
  data,
  ntrees = c(4, 8, 16, 32, 64, 128),
  pfun = NULL,
  shrink = 1,
  b.frac = 1,
  seed = 42,
  ...
)

Arguments

`model`	A model with corresponding predict function that returns numeric values.
`data`	Data that must not (!) contain the target variable.
`ntrees`	Sequence of integers: number of boosting trees for rule extraction.
`pfun`	Optional predict function `function(model, data)` returning a real number. Default is the `predict()` method of the `model`.
`shrink`	Sets the `shrinkage` argument for the internal call of `gbm`. As the `model` usually has a deterministic response the default is 1 different to the default of `gbm` applied train a model based on data.
`b.frac`	Sets the `bag.fraction` argument for the internal call of `gbm`. As the `model` usually has a deterministic response the default is 1 different to the default of `gbm` applied train a model based on data.
`seed`	Seed for the random number generator to ensure reproducible results (e.g. for the default `bag.fraction` < 1 in boosting).
`...`	Further arguments to be passed to `gbm` or the `predict()` method of the `model`.

Details

A surrogate grove is trained via gradient boosting using gbm on data with the predictions of using of the model as target variable. Note that data must not contain the original target variable! The boosting model is trained using stumps of depth 1. The resulting interpretation is extracted from pretty.gbm.tree.

Value

List of the results:

`explanation`	Matrix containing tree sizes, rules, explainability `{\Upsilon}` and the correlation between the predictions of the explanation and the true model.
`rules`	Summary of the explanation grove: Rules with identical splits are aggegated. For numeric variables any splits are merge if they lead to identical parititions of the training data
`groves`	Rules of the explanation grove.
`model`	`gbm` model.

Author(s)

gero.szepannek@web.de

References

Szepannek, G. and von Holt, B.H. (2023): Can’t see the forest for the trees – analyzing groves to explain random forests, Behaviormetrika, DOI: 10.1007/s41237-023-00205-2.
Szepannek, G. and Luebke, K.(2023): How much do we see? On the explainability of partial dependence plots for credit risk scoring, Argumenta Oeconomica 50, DOI: 10.15611/aoe.2023.1.07.

Examples

library(randomForest)
library(pdp)
data(boston)
set.seed(42)
rf <- randomForest(cmedv ~ ., data = boston)
data <- boston[,-3] # remove target variable
ntrees <- c(4,8,16,32,64,128)
xg <- xgrove(rf, data, ntrees)
xg
plot(xg)

[Package xgrove version 0.1-7 Index]