light_breakdown {flashlight} | R Documentation |
Variable Contribution Breakdown for Single Observation
Description
Calculates sequential additive variable contributions (approximate SHAP) to the prediction of a single observation, see Gosiewska and Biecek (see reference) and the details below.
Usage
light_breakdown(x, ...)
## Default S3 method:
light_breakdown(x, ...)
## S3 method for class 'flashlight'
light_breakdown(
x,
new_obs,
data = x$data,
by = x$by,
v = NULL,
visit_strategy = c("importance", "permutation", "v"),
n_max = Inf,
n_perm = 20,
seed = NULL,
use_linkinv = FALSE,
description = TRUE,
digits = 2,
...
)
## S3 method for class 'multiflashlight'
light_breakdown(x, ...)
Arguments
x |
An object of class "flashlight" or "multiflashlight". |
... |
Further arguments passed to |
new_obs |
One single new observation to calculate variable attribution for.
Needs to be a |
data |
An optional |
by |
An optional vector of column names used to filter |
v |
Vector of variable names to assess contribution for. Defaults to all except those specified by "y", "w" and "by". |
visit_strategy |
In what sequence should variables be visited?
By "importance", by |
n_max |
Maximum number of rows in |
n_perm |
Number of permutations of random visit sequences.
Only used if |
seed |
An integer random seed used to shuffle rows if |
use_linkinv |
Should retransformation function be applied? Default is |
description |
Should descriptions be added? Default is |
digits |
Passed to |
Details
The breakdown algorithm works as follows: First, the visit order
(x_1, ..., x_m)
of the variables v
is specified.
Then, in the query data
, the column x_1
is set to the value of x_1
of the single observation new_obs
to be explained.
The change in the (weighted) average prediction on data
measures the
contribution of x_1
on the prediction of new_obs
.
This procedure is iterated over all x_i
until eventually, all rows
in data
are identical to new_obs
.
A complication with this approach is that the visit order is relevant,
at least for non-additive models. Ideally, the algorithm could be repeated
for all possible permutations of v
and its results averaged per variable.
This is basically what SHAP values do, see the reference below for an explanation.
Unfortunately, there is no efficient way to do this in a model agnostic way.
We offer two visit strategies to approximate SHAP:
"importance": Using the short-cut described in the reference below: The variables are sorted by the size of their contribution in the same way as the breakdown algorithm but without iteration, i.e., starting from the original query data for each variable
x_i
."permutation": Averages contributions from a small number of random permutations of
v
.
Note that the minimum required elements in the (multi-)flashlight are a
"predict_function", "model", and "data". The latter can also directly be passed to
light_breakdown()
. Note that by default, no retransformation function is applied.
Value
An object of class "light_breakdown" with the following elements:
-
data
A tibble with results. Can be used to build fully customized visualizations. Column names can be controlled byoptions(flashlight.column_name)
. -
by
Same as inputby
.
Methods (by class)
-
light_breakdown(default)
: Default method not implemented yet. -
light_breakdown(flashlight)
: Variable attribution to single observation for a flashlight. -
light_breakdown(multiflashlight)
: Variable attribution to single observation for a multiflashlight.
References
A. Gosiewska and P. Biecek (2019). IBREAKDOWN: Uncertainty of model explanations for non-additive predictive models. ArXiv.
See Also
Examples
fit <- lm(Sepal.Length ~ . + Petal.Length:Species, data = iris)
fl <- flashlight(model = fit, label = "lm", data = iris, y = "Sepal.Length")
light_breakdown(fl, new_obs = iris[1, ])