predict_contributions.H2OModel {h2o}R Documentation

Predict feature contributions - SHAP values on an H2O Model (only DRF, GBM, XGBoost models and equivalent imported MOJOs).

Description

Default implemntation return H2OFrame shape (#rows, #features + 1) - there is a feature contribution column for each input feature, the last column is the model bias (same value for each row). The sum of the feature contributions and the bias term is equal to the raw prediction of the model. Raw prediction of tree-based model is the sum of the predictions of the individual trees before the inverse link function is applied to get the actual prediction. For Gaussian distribution the sum of the contributions is equal to the model prediction.

Usage

predict_contributions.H2OModel(
  object,
  newdata,
  output_format = c("compact", "original"),
  top_n = 0,
  bottom_n = 0,
  compare_abs = FALSE,
  background_frame = NULL,
  output_space = FALSE,
  output_per_reference = FALSE,
  ...
)

h2o.predict_contributions(
  object,
  newdata,
  output_format = c("compact", "original"),
  top_n = 0,
  bottom_n = 0,
  compare_abs = FALSE,
  background_frame = NULL,
  output_space = FALSE,
  output_per_reference = FALSE,
  ...
)

Arguments

object

a fitted H2OModel object for which prediction is desired

newdata

An H2OFrame object in which to look for variables with which to predict.

output_format

Specify how to output feature contributions in XGBoost - XGBoost by default outputs contributions for 1-hot encoded features, specifying a compact output format will produce a per-feature contribution. Defaults to original.

top_n

Return only #top_n highest contributions + bias If top_n<0 then sort all SHAP values in descending order If top_n<0 && bottom_n<0 then sort all SHAP values in descending order

bottom_n

Return only #bottom_n lowest contributions + bias If top_n and bottom_n are defined together then return array of #top_n + #bottom_n + bias If bottom_n<0 then sort all SHAP values in ascending order If top_n<0 && bottom_n<0 then sort all SHAP values in descending order

compare_abs

True to compare absolute values of contributions

background_frame

Optional frame, that is used as the source of baselines for the baseline SHAP (when output_per_reference == TRUE) or for the marginal SHAP (when output_per_reference == FALSE).

output_space

If TRUE, linearly scale the contributions so that they sum up to the prediction. NOTE: This will result only in approximate SHAP values even if the model supports exact SHAP calculation. NOTE: This will not have any effect if the estimator doesn't use a link function.

output_per_reference

If TRUE, return baseline SHAP, i.e., contribution for each data point for each reference from the background_frame. If FALSE, return TreeSHAP if no background_frame is provided, or marginal SHAP if background frame is provided. Can be used only with background_frame.

...

additional arguments to pass on.

Details

Note: Multinomial classification models are currently not supported.

Value

Returns an H2OFrame contain feature contributions for each input row.

See Also

h2o.gbm and h2o.randomForest for model generation in h2o.

Examples

## Not run: 
library(h2o)
h2o.init()
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.uploadFile(path = prostate_path)
prostate_gbm <- h2o.gbm(3:9, "AGE", prostate)
h2o.predict(prostate_gbm, prostate)
# Compute SHAP
h2o.predict_contributions(prostate_gbm, prostate)
# Compute SHAP and pick the top two highest
h2o.predict_contributions(prostate_gbm, prostate, top_n=2)
# Compute SHAP and pick the top two lowest
h2o.predict_contributions(prostate_gbm, prostate, bottom_n=2)
# Compute SHAP and pick the top two highest regardless of the sign
h2o.predict_contributions(prostate_gbm, prostate, top_n=2, compare_abs=TRUE)
# Compute SHAP and pick the top two lowest regardless of the sign
h2o.predict_contributions(prostate_gbm, prostate, bottom_n=2, compare_abs=TRUE)
# Compute SHAP values and show them all in descending order
h2o.predict_contributions(prostate_gbm, prostate, top_n=-1)
# Compute SHAP and pick the top two highest and top two lowest
h2o.predict_contributions(prostate_gbm, prostate, top_n=2, bottom_n=2)

# Compute Marginal SHAP, this enables looking at the contributions against different
# baselines, e.g., older people in the following example
h2o.predict_contributions(prostate_gbm, prostate, background_frame=prostate[prostate$AGE > 75, ])

## End(Not run)

[Package h2o version 3.44.0.3 Index]