safely_transform_categorical {rSAFE}R Documentation

Calculating a Transformation of Categorical Feature Using Hierarchical Clustering

Description

The safely_transform_categorical() function calculates a transformation function for the categorical variable using predictions obtained from black box model and hierarchical clustering. The gap statistic criterion is used to determine the optimal number of clusters.

Usage

safely_transform_categorical(
  explainer,
  variable,
  method = "complete",
  B = 500,
  collapse = "_"
)

Arguments

explainer

DALEX explainer created with explain() function

variable

a feature for which the transformation function is to be computed

method

the agglomeration method to be used in hierarchical clustering, one of: "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"

B

number of reference datasets used to calculate gap statistics

collapse

a character string to separate original levels while combining them to the new one

Value

list of information on the transformation of given variable

See Also

safe_extraction

Examples


library(DALEX)
library(randomForest)
library(rSAFE)

data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
                           no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
safely_transform_categorical(explainer_rf, "district")


[Package rSAFE version 0.1.4 Index]