shapley.plot {shapley} | R Documentation |
Plot weighted SHAP contributions
Description
This function applies different criteria to visualize SHAP contributions
Usage
shapley.plot(
shapley,
plot = "bar",
method = "lowerCI",
cutoff = 0,
top_n_features = NULL,
features = NULL,
legendstyle = "continuous",
scale_colour_gradient = NULL
)
Arguments
shapley |
object of class 'shapley', as returned by the 'shapley' function |
plot |
character, specifying the type of the plot, which can be either 'bar', 'waffle', or 'shap'. The default is 'bar'. |
method |
character, specifying the method used for identifying the most important features according to their weighted SHAP values. The default selection method is "lowerCI", which includes features whose lower weighted confidence interval exceeds the predefined 'cutoff' value (default is relative SHAP of 1 Alternatively, the "mean" option can be specified, indicating any feature with normalized weighted mean SHAP contribution above the specified 'cutoff' should be selected. Another alternative options is "shapratio", a method that filters for features where the proportion of their relative weighted SHAP value exceeds the 'cutoff'. This approach calculates the relative contribution of each feature's weighted SHAP value against the aggregate of all features, with those surpassing the 'cutoff' being selected as top feature. |
cutoff |
numeric, specifying the cutoff for the method used for selecting the top features. |
top_n_features |
integer. if specified, the top n features with the highest weighted SHAP values will be selected, overrullung the 'cutoff' and 'method' arguments. |
features |
character vector, specifying the feature to be plotted. |
legendstyle |
character, specifying the style of the plot legend, which can be either 'continuous' (default) or 'discrete'. the continuous legend is only applicable to 'shap' plots and other plots only use 'discrete' legend. |
scale_colour_gradient |
character vector for specifying the color gradients for the plot. |
Value
ggplot object
Author(s)
E. F. Haghish
Examples
## Not run:
# load the required libraries for building the base-learners and the ensemble models
library(h2o) #shapley supports h2o models
library(shapley)
# initiate the h2o server
h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE)
# upload data to h2o cloud
prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
prostate <- h2o.importFile(path = prostate_path, header = TRUE)
### H2O provides 2 types of grid search for tuning the models, which are
### AutoML and Grid. Below, I demonstrate how weighted mean shapley values
### can be computed for both types.
set.seed(10)
#######################################################
### PREPARE AutoML Grid (takes a couple of minutes)
#######################################################
# run AutoML to tune various models (GBM) for 60 seconds
y <- "CAPSULE"
prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification
aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 120,
include_algos=c("GBM"),
# this setting ensures the models are comparable for building a meta learner
seed = 2023, nfolds = 10,
keep_cross_validation_predictions = TRUE)
### call 'shapley' function to compute the weighted mean and weighted confidence intervals
### of SHAP values across all trained models.
### Note that the 'newdata' should be the testing dataset!
result <- shapley(models = aml, newdata = prostate, plot = TRUE)
#######################################################
### PLOT THE WEIGHTED MEAN SHAP VALUES
#######################################################
shapley.plot(result, plot = "bar")
shapley.plot(result, plot = "waffle")
## End(Not run)