R: Interactive Studio for Explanatory Model Analysis

modelStudio {modelStudio}

R Documentation

Interactive Studio for Explanatory Model Analysis

Description

This function computes various (instance and dataset level) model explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. Easily save the dashboard and share it with others. Tools for Explanatory Model Analysis unite with tools for Exploratory Data Analysis to give a broad overview of the model behavior.

The extensive documentation covers:

Function parameters description - perks and features
Framework and model compatibility - R & Python examples
Theoretical introduction to the plots - Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models

Displayed variable can be changed by clicking on the bars of plots or with the first dropdown list, and observation can be changed with the second dropdown list. The dashboard gathers useful, but not sensitive, information about how it is being used (e.g. computation length, package version, dashboard dimensions). This is for the development purposes only and can be blocked by setting telemetry to FALSE.

Usage

modelStudio(explainer, ...)

## S3 method for class 'explainer'
modelStudio(
  explainer,
  new_observation = NULL,
  new_observation_y = NULL,
  new_observation_n = 3,
  facet_dim = c(2, 2),
  time = 500,
  max_features = 10,
  max_features_fi = NULL,
  N = 300,
  N_fi = N * 10,
  N_sv = N * 3,
  B = 10,
  B_fi = B,
  eda = TRUE,
  open_plots = c("fi"),
  show_info = TRUE,
  parallel = FALSE,
  options = ms_options(),
  viewer = "external",
  widget_id = NULL,
  license = NULL,
  telemetry = TRUE,
  max_vars = NULL,
  verbose = NULL,
  ...
)

Arguments

`explainer`	An `explainer` created with `DALEX::explain()`.
`...`	Other parameters.
`new_observation`	New observations with columns that correspond to variables used in the model.
`new_observation_y`	True label for `new_observation` (optional).
`new_observation_n`	Number of observations to be taken from the `explainer$data` if `new_observation = NULL`. See vignette
`facet_dim`	Dimensions of the grid. Default is `c(2,2)`.
`time`	Time in ms. Set the animation length. Default is `500`.
`max_features`	Maximum number of features to be included in BD, SV, and FI plots. Default is `10`.
`max_features_fi`	Maximum number of features to be included in FI plot. Default is `max_features`.
`N`	Number of observations used for the calculation of PD and AD. Default is `300`. See vignette
`N_fi`	Number of observations used for the calculation of FI. Default is `10*N`.
`N_sv`	Number of observations used for the calculation of SV. Default is `3*N`.
`B`	Number of permutation rounds used for calculation of SV. Default is `10`. See vignette
`B_fi`	Number of permutation rounds used for calculation of FI. Default is `B`.
`eda`	Compute EDA plots and Residuals vs Feature plot, which adds the data to the dashboard. Default is `TRUE`.
`open_plots`	A vector listing plots to be initially opened (and on which positions). Default is `c("fi")`.
`show_info`	Verbose a progress on the console. Default is `TRUE`.
`parallel`	Speed up the computation using `parallelMap::parallelMap()`. See vignette. This might interfere with showing progress using `show_info`.
`options`	Customize `modelStudio`. See `ms_options` and vignette.
`viewer`	Default is `external` to display in an external RStudio window. Use `browser` to display in an external browser or `internal` to use the RStudio internal viewer pane for output.
`widget_id`	Use an explicit element ID for the widget (rather than an automatically generated one). Useful e.g. when using `modelStudio` with Shiny. See vignette.
`license`	Path to the file containing the license (`con` parameter passed to `readLines()`). It can be used e.g. to include the license for `explainer$data` as a comment in the source of `.html` output file.
`telemetry`	The dashboard gathers useful, but not sensitive, information about how it is being used (e.g. computation length, package version, dashboard dimensions). This is for the development purposes only and can be blocked by setting `telemetry` to `FALSE`.
`max_vars`	An alias for `max_features`. If provided, it will override the value.
`verbose`	An alias for `show_info`. If provided, it will override the value.

Value

An object of the r2d3, htmlwidget, modelStudio class.

References

The input object is implemented in DALEX
Feature Importance, Ceteris Paribus, Partial Dependence and Accumulated Dependence explanations are implemented in ingredients
Break Down and Shapley Values explanations are implemented in iBreakDown

Examples

library("DALEX")
library("modelStudio")

#:# ex1 classification on 'titanic' data

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived,
                             label = "Titanic GLM")

# pick observations
new_observations <- titanic_imputed[1:2,]
rownames(new_observations) <- c("Lucas","James")

# make a studio for the model
modelStudio(explainer_titanic,
            new_observations,
            N = 200,  B = 5) # faster example



#:# ex2 regression on 'apartments' data
if (requireNamespace("ranger", quietly=TRUE)) {
  library("ranger")
  model_apartments <- ranger(m2.price ~. ,data = apartments)

  explainer_apartments <- explain(model_apartments,
                                  data = apartments,
                                  y = apartments$m2.price)

  new_apartments <- apartments[1:2,]
  rownames(new_apartments) <- c("ap1","ap2")

  # change dashboard dimensions and animation length
  modelStudio(explainer_apartments,
              new_apartments,
              facet_dim = c(2, 3),
              time = 800)

  # add information about true labels
  modelStudio(explainer_apartments,
              new_apartments,
              new_observation_y = new_apartments$m2.price)

  # don't compute EDA plots
  modelStudio(explainer_apartments,
              eda = FALSE)
}

#:# ex3 xgboost model on 'HR' dataset
if (requireNamespace("xgboost", quietly=TRUE)) {
  library("xgboost")
  HR_matrix <- model.matrix(status == "fired" ~ . -1, HR)

  # fit a model
  xgb_matrix <- xgb.DMatrix(HR_matrix, label = HR$status == "fired")
  params <- list(max_depth = 3, objective = "binary:logistic", eval_metric = "auc")
  model_HR <- xgb.train(params, xgb_matrix, nrounds = 300)

  # create an explainer for the model
  explainer_HR <- explain(model_HR,
                          data = HR_matrix,
                          y = HR$status == "fired",
                          type = "classification",
                          label = "xgboost")

  # pick observations
  new_observation <- HR_matrix[1:2, , drop=FALSE]
  rownames(new_observation) <- c("id1", "id2")

  # make a studio for the model
  modelStudio(explainer_HR,
              new_observation)
}

[Package modelStudio version 3.1.2 Index]