permshap {kernelshap}R Documentation

Permutation SHAP

Description

Exact permutation SHAP algorithm with respect to a background dataset, see Strumbelj and Kononenko. The function works for up to 14 features.

Usage

permshap(object, ...)

## Default S3 method:
permshap(
  object,
  X,
  bg_X,
  pred_fun = stats::predict,
  feature_names = colnames(X),
  bg_w = NULL,
  parallel = FALSE,
  parallel_args = NULL,
  verbose = TRUE,
  ...
)

## S3 method for class 'ranger'
permshap(
  object,
  X,
  bg_X,
  pred_fun = function(m, X, ...) stats::predict(m, X, ...)$predictions,
  feature_names = colnames(X),
  bg_w = NULL,
  parallel = FALSE,
  parallel_args = NULL,
  verbose = TRUE,
  ...
)

## S3 method for class 'Learner'
permshap(
  object,
  X,
  bg_X,
  pred_fun = NULL,
  feature_names = colnames(X),
  bg_w = NULL,
  parallel = FALSE,
  parallel_args = NULL,
  verbose = TRUE,
  ...
)

Arguments

object

Fitted model object.

...

Additional arguments passed to pred_fun(object, X, ...).

X

(n \times p) matrix or data.frame with rows to be explained. The columns should only represent model features, not the response (but see feature_names on how to overrule this).

bg_X

Background data used to integrate out "switched off" features, often a subset of the training data (typically 50 to 500 rows) It should contain the same columns as X. In cases with a natural "off" value (like MNIST digits), this can also be a single row with all values set to the off value.

pred_fun

Prediction function of the form ⁠function(object, X, ...)⁠, providing K \ge 1 predictions per row. Its first argument represents the model object, its second argument a data structure like X. Additional (named) arguments are passed via .... The default, stats::predict(), will work in most cases.

feature_names

Optional vector of column names in X used to calculate SHAP values. By default, this equals colnames(X). Not supported if X is a matrix.

bg_w

Optional vector of case weights for each row of bg_X.

parallel

If TRUE, use parallel foreach::foreach() to loop over rows to be explained. Must register backend beforehand, e.g., via 'doFuture' package, see README for an example. Parallelization automatically disables the progress bar.

parallel_args

Named list of arguments passed to foreach::foreach(). Ideally, this is NULL (default). Only relevant if parallel = TRUE. Example on Windows: if object is a GAM fitted with package 'mgcv', then one might need to set parallel_args = list(.packages = "mgcv").

verbose

Set to FALSE to suppress messages and the progress bar.

Value

An object of class "permshap" with the following components:

Methods (by class)

References

  1. Erik Strumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 2014.

Examples

# MODEL ONE: Linear regression
fit <- lm(Sepal.Length ~ ., data = iris)

# Select rows to explain (only feature columns)
X_explain <- iris[1:2, -1]

# Select small background dataset (could use all rows here because iris is small)
set.seed(1)
bg_X <- iris[sample(nrow(iris), 100), ]

# Calculate SHAP values
s <- permshap(fit, X_explain, bg_X = bg_X)
s

# MODEL TWO: Multi-response linear regression
fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris)
s <- permshap(fit, iris[1:4, 3:5], bg_X = bg_X)
s

# Non-feature columns can be dropped via 'feature_names'
s <- permshap(
  fit,
  iris[1:4, ],
  bg_X = bg_X,
  feature_names = c("Petal.Length", "Petal.Width", "Species")
)
s

[Package kernelshap version 0.4.1 Index]