safely_select_variables {rSAFE} | R Documentation |
Performing Feature Selection on the Dataset with Transformed Variables
Description
The safely_select_variables() function selects variables from dataset returned by safely_transform_data() function. For each original variable exactly one variable is chosen
either original one or transformed one. The choice is based on the AIC value for linear model (regression) or logistic regression (classification).
Usage
safely_select_variables(
safe_extractor,
data,
y = NULL,
which_y = NULL,
class_pred = NULL,
verbose = TRUE
)
Arguments
safe_extractor |
object containing information about variables transformations created with safe_extraction() function |
data |
data, original dataset or the one returned by safely_transform_data() function. If data do not contain transformed variables then transformation is done inside this function using 'safe_extractor' argument. Data may contain response variable or not - if it does then 'which_y' argument must be given, otherwise 'y' argument should be provided. |
y |
vector of responses, must be given if data does not contain it |
which_y |
numeric or character (optional), must be given if data contains response values |
class_pred |
numeric or character, used only in multi-classification problems. If response vector has more than two levels, then 'class_pred' should indicate the class of interest which will denote failure - all other classes will stand for success. |
verbose |
logical, if progress bar is to be printed |
Value
vector of variables names, selected based on AIC values
See Also
Examples
library(DALEX)
library(randomForest)
library(rSAFE)
data <- apartments[1:500,]
set.seed(111)
model_rf <- randomForest(m2.price ~ construction.year + surface + floor +
no.rooms + district, data = data)
explainer_rf <- explain(model_rf, data = data[,2:6], y = data[,1])
safe_extractor <- safe_extraction(explainer_rf, verbose = FALSE)
safely_select_variables(safe_extractor, data, which_y = "m2.price", verbose = FALSE)