light_profile {flashlight} | R Documentation |
Partial Dependence and other Profiles
Description
Calculates different types of profiles across covariable values. By default, partial dependence profiles are calculated (see Friedman). Other options are profiles of ALE (accumulated local effects, see Apley), response, predicted values ("M plots" or "marginal plots", see Apley), residuals, and shap. The results are aggregated either by (weighted) means or by (weighted) quartiles.
Note that ALE profiles are calibrated by (weighted) average predictions. In contrast to the suggestions in Apley, we calculate ALE profiles of factors in the same order as the factor levels. They are not being reordered based on similiarity of other variables.
Usage
light_profile(x, ...)
## Default S3 method:
light_profile(x, ...)
## S3 method for class 'flashlight'
light_profile(
x,
v = NULL,
data = NULL,
by = x$by,
type = c("partial dependence", "ale", "predicted", "response", "residual", "shap"),
stats = c("mean", "quartiles"),
breaks = NULL,
n_bins = 11L,
cut_type = c("equal", "quantile"),
use_linkinv = TRUE,
counts = TRUE,
counts_weighted = FALSE,
v_labels = TRUE,
pred = NULL,
pd_evaluate_at = NULL,
pd_grid = NULL,
pd_indices = NULL,
pd_n_max = 1000L,
pd_seed = NULL,
pd_center = c("no", "first", "middle", "last", "mean", "0"),
ale_two_sided = FALSE,
...
)
## S3 method for class 'multiflashlight'
light_profile(
x,
v = NULL,
data = NULL,
type = c("partial dependence", "ale", "predicted", "response", "residual", "shap"),
breaks = NULL,
n_bins = 11L,
cut_type = c("equal", "quantile"),
pd_evaluate_at = NULL,
pd_grid = NULL,
...
)
Arguments
x |
An object of class "flashlight" or "multiflashlight". |
... |
Further arguments passed to |
v |
The variable name to be profiled. |
data |
An optional |
by |
An optional vector of column names used to additionally group the results. |
type |
Type of the profile: Either "partial dependence", "ale", "predicted", "response", "residual", or "shap". |
stats |
Statistic to calculate: "mean" or "quartiles". For ALE profiles, only "mean" makes sense. |
breaks |
Cut breaks for a numeric |
n_bins |
Approximate number of unique values to evaluate for numeric |
cut_type |
Should a numeric |
use_linkinv |
Should retransformation function be applied? Default is |
counts |
Should observation counts be added? |
counts_weighted |
If |
v_labels |
If |
pred |
Optional vector with predictions (after application of inverse link).
Can be used to avoid recalculation of predictions over and over if the functions
is to be repeatedly called for different |
pd_evaluate_at |
Vector with values of |
pd_grid |
A |
pd_indices |
A vector of row numbers to consider in calculating partial dependence profiles and "ale". |
pd_n_max |
Maximum number of ICE profiles to calculate (will be randomly
picked from |
pd_seed |
Integer random seed used to select ICE profiles for partial dependence and ALE. |
pd_center |
How should ICE curves be centered?
|
ale_two_sided |
If |
Details
Numeric covariables v
with more than n_bins
disjoint values
are binned into n_bins
bins. Alternatively, breaks
can be provided
to specify the binning. For partial dependence profiles
(and partly also ALE profiles), this behaviour can be overwritten either
by providing a vector of evaluation points (pd_evaluate_at
) or an
evaluation pd_grid
. By the latter we mean a data frame with column name(s)
with a (multi-)variate evaluation grid.
For partial dependence, ALE, and prediction profiles, "model", "predict_function", "linkinv" and "data" are required. For response profiles its "y", "linkinv" and "data", and for shap profiles it is just "shap". "data" can be passed on the fly.
Value
An object of class "light_profile" with the following elements:
-
data
A tibble containing results. Can be used to build fully customized visualizations. Column names can be controlled byoptions(flashlight.column_name)
. -
by
Names of group by variable. -
v
The variable(s) evaluated. -
type
Same as inputtype
. For information only. -
stats
Same as inputstats
.
Methods (by class)
-
light_profile(default)
: Default method not implemented yet. -
light_profile(flashlight)
: Profiles for flashlight. -
light_profile(multiflashlight)
: Profiles for multiflashlight.
References
Friedman J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29:1189–1232.
Apley D. W. (2016). Visualizing the effects of predictor variables in black box supervised learning models.
See Also
light_effects()
, plot.light_profile()
Examples
fit <- lm(Sepal.Length ~ ., data = iris)
fl <- flashlight(model = fit, label = "iris", data = iris, y = "Sepal.Length")
light_profile(fl, v = "Species")
light_profile(fl, v = "Petal.Width", type = "residual")