kernelshap {kernelshap} | R Documentation |
Kernel SHAP
Description
Efficient implementation of Kernel SHAP, see Lundberg and Lee (2017), and
Covert and Lee (2021), abbreviated by CL21.
For up to p=8
features, the resulting Kernel SHAP values are exact regarding
the selected background data. For larger p
, an almost exact
hybrid algorithm involving iterative sampling is used, see Details.
Usage
kernelshap(object, ...)
## Default S3 method:
kernelshap(
object,
X,
bg_X,
pred_fun = stats::predict,
feature_names = colnames(X),
bg_w = NULL,
exact = length(feature_names) <= 8L,
hybrid_degree = 1L + length(feature_names) %in% 4:16,
paired_sampling = TRUE,
m = 2L * length(feature_names) * (1L + 3L * (hybrid_degree == 0L)),
tol = 0.005,
max_iter = 100L,
parallel = FALSE,
parallel_args = NULL,
verbose = TRUE,
...
)
## S3 method for class 'ranger'
kernelshap(
object,
X,
bg_X,
pred_fun = function(m, X, ...) stats::predict(m, X, ...)$predictions,
feature_names = colnames(X),
bg_w = NULL,
exact = length(feature_names) <= 8L,
hybrid_degree = 1L + length(feature_names) %in% 4:16,
paired_sampling = TRUE,
m = 2L * length(feature_names) * (1L + 3L * (hybrid_degree == 0L)),
tol = 0.005,
max_iter = 100L,
parallel = FALSE,
parallel_args = NULL,
verbose = TRUE,
...
)
Arguments
object |
Fitted model object. |
... |
Additional arguments passed to |
X |
|
bg_X |
Background data used to integrate out "switched off" features,
often a subset of the training data (typically 50 to 500 rows)
It should contain the same columns as |
pred_fun |
Prediction function of the form |
feature_names |
Optional vector of column names in |
bg_w |
Optional vector of case weights for each row of |
exact |
If |
hybrid_degree |
Integer controlling the exactness of the hybrid strategy. For
|
paired_sampling |
Logical flag indicating whether to do the sampling in a paired
manner. This means that with every on-off vector |
m |
Even number of on-off vectors sampled during one iteration.
The default is |
tol |
Tolerance determining when to stop. Following CL21, the algorithm keeps
iterating until |
max_iter |
If the stopping criterion (see |
parallel |
If |
parallel_args |
Named list of arguments passed to |
verbose |
Set to |
Details
Pure iterative Kernel SHAP sampling as in Covert and Lee (2021) works like this:
A binary "on-off" vector
z
is drawn from\{0, 1\}^p
such that its sum follows the SHAP Kernel weight distribution (normalized to the range\{1, \dots, p-1\}
).For each
j
withz_j = 1
, thej
-th column of the original background data is replaced by the corresponding feature valuex_j
of the observation to be explained.The average prediction
v_z
on the data of Step 2 is calculated, and the average predictionv_0
on the background data is subtracted.Steps 1 to 3 are repeated
m
times. This produces a binarym \times p
matrixZ
(each row equals one of thez
) and a vectorv
of shifted predictions.-
v
is regressed ontoZ
under the constraint that the sum of the coefficients equalsv_1 - v_0
, wherev_1
is the prediction of the observation to be explained. The resulting coefficients are the Kernel SHAP values.
This is repeated multiple times until convergence, see CL21 for details.
A drawback of this strategy is that many (at least 75%) of the z
vectors will
have \sum z \in \{1, p-1\}
, producing many duplicates. Similarly, at least 92%
of the mass will be used for the p(p+1)
possible vectors with
\sum z \in \{1, 2, p-2, p-1\}
.
This inefficiency can be fixed by a hybrid strategy, combining exact calculations
with sampling.
The hybrid algorithm has two steps:
Step 1 (exact part): There are
2p
different on-off vectorsz
with\sum z \in \{1, p-1\}
, covering a large proportion of the Kernel SHAP distribution. The degree 1 hybrid will list those vectors and use them according to their weights in the upcoming calculations. Depending onp
, we can also go a step further to a degree 2 hybrid by adding allp(p-1)
vectors with\sum z \in \{2, p-2\}
to the process etc. The necessary predictions are obtained along with other calculations similar to those described in CL21.Step 2 (sampling part): The remaining weight is filled by sampling vectors z according to Kernel SHAP weights renormalized to the values not yet covered by Step 1. Together with the results from Step 1 - correctly weighted - this now forms a complete iteration as in CL21. The difference is that most mass is covered by exact calculations. Afterwards, the algorithm iterates until convergence. The output of Step 1 is reused in every iteration, leading to an extremely efficient strategy.
If p
is sufficiently small, all possible 2^p-2
on-off vectors z
can be
evaluated. In this case, no sampling is required and the algorithm returns exact
Kernel SHAP values with respect to the given background data.
Since kernelshap()
calculates predictions on data with MN
rows
(N
is the background data size and M
the number of z
vectors), p
should not be much higher than 10 for exact calculations.
For similar reasons, degree 2 hybrids should not use p
much larger than 40.
Value
An object of class "kernelshap" with the following components:
-
S
:(n \times p)
matrix with SHAP values or, if the model output has dimensionK > 1
, a list ofK
such matrices. -
X
: Same as input argumentX
. -
baseline
: Vector of length K representing the average prediction on the background data. -
SE
: Standard errors corresponding toS
(and organized likeS
). -
n_iter
: Integer vector of length n providing the number of iterations per row ofX
. -
converged
: Logical vector of length n indicating convergence per row ofX
. -
m
: Integer providing the effective number of sampled on-off vectors used per iteration. -
m_exact
: Integer providing the effective number of exact on-off vectors used per iteration. -
prop_exact
: Proportion of the Kernel SHAP weight distribution covered by exact calculations. -
exact
: Logical flag indicating whether calculations are exact or not. -
txt
: Summary text. -
predictions
:(n \times K)
matrix with predictions ofX
. -
algorithm
: "kernelshap".
Methods (by class)
-
kernelshap(default)
: Default Kernel SHAP method. -
kernelshap(ranger)
: Kernel SHAP method for "ranger" models, see Readme for an example.
References
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017.
Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
Examples
# MODEL ONE: Linear regression
fit <- lm(Sepal.Length ~ ., data = iris)
# Select rows to explain (only feature columns)
X_explain <- iris[1:2, -1]
# Select small background dataset (could use all rows here because iris is small)
set.seed(1)
bg_X <- iris[sample(nrow(iris), 100), ]
# Calculate SHAP values
s <- kernelshap(fit, X_explain, bg_X = bg_X)
s
# MODEL TWO: Multi-response linear regression
fit <- lm(as.matrix(iris[, 1:2]) ~ Petal.Length + Petal.Width + Species, data = iris)
s <- kernelshap(fit, iris[1:4, 3:5], bg_X = bg_X)
summary(s)
# Non-feature columns can be dropped via 'feature_names'
s <- kernelshap(
fit,
iris[1:4, ],
bg_X = bg_X,
feature_names = c("Petal.Length", "Petal.Width", "Species")
)
s