predict_contributions.H2OModel {h2o} | R Documentation |
Predict feature contributions - SHAP values on an H2O Model (only DRF, GBM, XGBoost models and equivalent imported MOJOs).
Description
Default implemntation return H2OFrame shape (#rows, #features + 1) - there is a feature contribution column for each input feature, the last column is the model bias (same value for each row). The sum of the feature contributions and the bias term is equal to the raw prediction of the model. Raw prediction of tree-based model is the sum of the predictions of the individual trees before the inverse link function is applied to get the actual prediction. For Gaussian distribution the sum of the contributions is equal to the model prediction.
Usage
predict_contributions.H2OModel(
object,
newdata,
output_format = c("compact", "original"),
top_n = 0,
bottom_n = 0,
compare_abs = FALSE,
background_frame = NULL,
output_space = FALSE,
output_per_reference = FALSE,
...
)
h2o.predict_contributions(
object,
newdata,
output_format = c("compact", "original"),
top_n = 0,
bottom_n = 0,
compare_abs = FALSE,
background_frame = NULL,
output_space = FALSE,
output_per_reference = FALSE,
...
)
Arguments
object |
a fitted H2OModel object for which prediction is desired |
newdata |
An H2OFrame object in which to look for variables with which to predict. |
output_format |
Specify how to output feature contributions in XGBoost - XGBoost by default outputs contributions for 1-hot encoded features, specifying a compact output format will produce a per-feature contribution. Defaults to original. |
top_n |
Return only #top_n highest contributions + bias If top_n<0 then sort all SHAP values in descending order If top_n<0 && bottom_n<0 then sort all SHAP values in descending order |
bottom_n |
Return only #bottom_n lowest contributions + bias If top_n and bottom_n are defined together then return array of #top_n + #bottom_n + bias If bottom_n<0 then sort all SHAP values in ascending order If top_n<0 && bottom_n<0 then sort all SHAP values in descending order |
compare_abs |
True to compare absolute values of contributions |
background_frame |
Optional frame, that is used as the source of baselines for the baseline SHAP (when output_per_reference == TRUE) or for the marginal SHAP (when output_per_reference == FALSE). |
output_space |
If TRUE, linearly scale the contributions so that they sum up to the prediction. NOTE: This will result only in approximate SHAP values even if the model supports exact SHAP calculation. NOTE: This will not have any effect if the estimator doesn't use a link function. |
output_per_reference |
If TRUE, return baseline SHAP, i.e., contribution for each data point for each reference from the background_frame. If FALSE, return TreeSHAP if no background_frame is provided, or marginal SHAP if background frame is provided. Can be used only with background_frame. |
... |
additional arguments to pass on. |
Details
Note: Multinomial classification models are currently not supported.
Value
Returns an H2OFrame contain feature contributions for each input row.
See Also
h2o.gbm
and h2o.randomForest
for model
generation in h2o.
Examples
## Not run:
library(h2o)
h2o.init()
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.uploadFile(path = prostate_path)
prostate_gbm <- h2o.gbm(3:9, "AGE", prostate)
h2o.predict(prostate_gbm, prostate)
# Compute SHAP
h2o.predict_contributions(prostate_gbm, prostate)
# Compute SHAP and pick the top two highest
h2o.predict_contributions(prostate_gbm, prostate, top_n=2)
# Compute SHAP and pick the top two lowest
h2o.predict_contributions(prostate_gbm, prostate, bottom_n=2)
# Compute SHAP and pick the top two highest regardless of the sign
h2o.predict_contributions(prostate_gbm, prostate, top_n=2, compare_abs=TRUE)
# Compute SHAP and pick the top two lowest regardless of the sign
h2o.predict_contributions(prostate_gbm, prostate, bottom_n=2, compare_abs=TRUE)
# Compute SHAP values and show them all in descending order
h2o.predict_contributions(prostate_gbm, prostate, top_n=-1)
# Compute SHAP and pick the top two highest and top two lowest
h2o.predict_contributions(prostate_gbm, prostate, top_n=2, bottom_n=2)
# Compute Marginal SHAP, this enables looking at the contributions against different
# baselines, e.g., older people in the following example
h2o.predict_contributions(prostate_gbm, prostate, background_frame=prostate[prostate$AGE > 75, ])
## End(Not run)