safely_select_variables {rSAFE}R Documentation

Performing Feature Selection on the Dataset with Transformed Variables

Description

The safely_select_variables() function selects variables from dataset returned by safely_transform_data() function. For each original variable exactly one variable is chosen

Usage

safely_select_variables(
  safe_extractor,
  data,
  y = NULL,
  which_y = NULL,
  class_pred = NULL,
  verbose = TRUE
)

Arguments

safe_extractor

object containing information about variables transformations created with safe_extraction() function

data

data, original dataset or the one returned by safely_transform_data() function. If data do not contain transformed variables then transformation is done inside this function using 'safe_extractor' argument. Data may contain response variable or not - if it does then 'which_y' argument must be given, otherwise 'y' argument should be provided.

y

vector of responses, must be given if data does not contain it

which_y

numeric or character (optional), must be given if data contains response values

class_pred

numeric or character, used only in multi-classification problems. If response vector has more than two levels, then 'class_pred' should indicate the class of interest which will denote failure - all other classes will stand for success.

verbose

logical, if progress bar is to be printed

Value

vector of variables names, selected based on AIC values

See Also

safely_transform_data

Examples


library(DALEX)
library(randomForest)
library(rSAFE)

data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
                           no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
safe_extractor <- safe_extraction(explainer_rf, verbose = FALSE)
safely_select_variables(safe_extractor, data, which_y = "m2.price", verbose = FALSE)


[Package rSAFE version 0.1.4 Index]