R: Create explainer from your h2o model

explain_h2o {DALEXtra}

R Documentation

Create explainer from your h2o model

Description

DALEX is designed to work with various black-box models like tree ensembles, linear models, neural networks etc. Unfortunately R packages that create such models are very inconsistent. Different tools use different interfaces to train, validate and use models. One of those tools, we would like to make more accessible is H2O.

Usage

explain_h2o(
  model,
  data = NULL,
  y = NULL,
  weights = NULL,
  predict_function = NULL,
  predict_function_target_column = NULL,
  residual_function = NULL,
  ...,
  label = NULL,
  verbose = TRUE,
  precalculate = TRUE,
  colorize = !isTRUE(getOption("knitr.in.progress")),
  model_info = NULL,
  type = NULL
)

Arguments

`model`	object - a model to be explained
`data`	data.frame or matrix - data which will be used to calculate the explanations. If not provided, then it will be extracted from the model. Data should be passed without a target column (this shall be provided as the `y` argument). NOTE: If the target variable is present in the `data`, some of the functionalities may not work properly.
`y`	numeric vector with outputs/scores. If provided, then it shall have the same size as `data`
`weights`	numeric vector with sampling weights. By default it's `NULL`. If provided, then it shall have the same length as `data`
`predict_function`	function that takes two arguments: model and new data and returns a numeric vector with predictions. By default it is `yhat`.
`predict_function_target_column`	Character or numeric containing either column name or column number in the model prediction object of the class that should be considered as positive (i.e. the class that is associated with probability 1). If NULL, the second column of the output will be taken for binary classification. For a multiclass classification setting, that parameter cause switch to binary classification mode with one vs others probabilities.
`residual_function`	function that takes four arguments: model, data, target vector y and predict function (optionally). It should return a numeric vector with model residuals for given data. If not provided, response residuals (`y-\hat{y}`) are calculated. By default it is `residual_function_default`.
`...`	other parameters
`label`	character - the name of the model. By default it's extracted from the 'class' attribute of the model
`verbose`	logical. If TRUE (default) then diagnostic messages will be printed
`precalculate`	logical. If TRUE (default) then `predicted_values` and `residual` are calculated when explainer is created. This will happen also if `verbose` is TRUE. Set both `verbose` and `precalculate` to FALSE to omit calculations.
`colorize`	logical. If TRUE (default) then `WARNINGS`, `ERRORS` and `NOTES` are colorized. Will work only in the R console. Now by default it is `FALSE` while knitting and `TRUE` otherwise.
`model_info`	a named list (`package`, `version`, `type`) containing information about model. If `NULL`, `DALEX` will seek for information on it's own.
`type`	type of a model, either `classification` or `regression`. If not specified then `type` will be extracted from `model_info`.

Value

explainer object (explain) ready to work with DALEX

Examples




# load packages and data
library(h2o)
library(DALEXtra)

# data <- DALEX::titanic_imputed

# init h2o
 cluster <- try(h2o::h2o.init())
if (!inherits(cluster, "try-error")) {
# stop h2o progress printing
 h2o.no_progress()

# split the data
# h2o_split <- h2o.splitFrame(as.h2o(data))
# train <- h2o_split[[1]]
# test <- as.data.frame(h2o_split[[2]])
# h2o automl takes target as factor
# train$survived <- as.factor(train$survived)

# fit a model
# automl <- h2o.automl(y = "survived",
#                   training_frame = train,
#                    max_runtime_secs = 30)


# create an explainer for the model
# explainer <- explain_h2o(automl,
#                        data = test,
#                         y = test$survived,
#                          label = "h2o")


titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra"))
titanic_train <- read.csv(system.file("extdata", "titanic_train.csv", package = "DALEXtra"))
titanic_h2o <- h2o::as.h2o(titanic_train)
titanic_h2o["survived"] <- h2o::as.factor(titanic_h2o["survived"])
titanic_test_h2o <- h2o::as.h2o(titanic_test)
model <- h2o::h2o.gbm(
training_frame = titanic_h2o,
y = "survived",
distribution = "bernoulli",
ntrees = 500,
max_depth = 4,
min_rows =  12,
learn_rate = 0.001
)
explain_h2o(model, titanic_test[,1:17], titanic_test[,18])

try(h2o.shutdown(prompt = FALSE))
 }

[Package DALEXtra version 2.3.0 Index]