rfeTerminator {FeatureTerminatoR}R Documentation

Recursive Feature Engineering SelectoR

Description

This function removes the redundant features in a model and automatically selects the best combination of features to remove. This utilises, by default, the random forest mean decrease in accuracy methods, from the caret package, reference Kuhn (2021). This function is a wrapper for the rfe() function

Usage

rfeTerminator(
  df,
  x_cols,
  y_cols,
  method = "cv",
  kfolds = 10,
  sizes = c(1:100),
  alter_df = TRUE,
  eval_funcs = rfFuncs,
  ...
)

Arguments

df

data frame to fit the recursive feature engineering algorithm to

x_cols

the independent variables to be used for the recursive feature engineering algorithm

y_cols

the dependent variables to be used in the prediction

method

Default = "cv"- cross validation method for resampling, other options "repeatedcv"

kfolds

Default = 10 - the number of k folds - train / test splits to compute when resampling

sizes

the sizes of the search boundary for the search

alter_df

Default = TRUE - will remove the redundant features, due to having a lesser affect on the mean decrease in accuracy, or other measures.

eval_funcs

Default = rfFuncs (Random Forest Mean Decrease Accuracy method). Other options: rfe, lmFuncs, rfFuncs, treebagFuncs, nbFuncs, pickSizeBest, pickSizeTolerance.

...

Function forwarding to main 'caret::rfe() function' to pass in additional parameters native to caret

Details

With the df_alter set to TRUE the recursive feature algorithm chosen will automatically remove the features from the returned tibble embedded in the list.

Value

A list containing the outputs highlighted hereunder:

References

Kuhn (2021) Recursive Feature Elimination. https://topepo.github.io/caret/recursive-feature-elimination.html

Examples

library(caret)
library(tibble)
library(FeatureTerminatoR)
library(dplyr)
df <- iris
# Passing in the indexes as slices x values located in index 1:4 and y value in location 5
rfe_fit <- rfeTerminator(df, x_cols= 1:4, y_cols=5, alter_df = TRUE, eval_funcs = rfFuncs)
#Explore the optimal model results
print(rfe_fit$rfe_model_fit_results)
# Explore the optimal variables selected
print(rfe_fit$rfe_model_fit_results$optVariables)
# Explore the original data passed to the frame
print(head(rfe_fit$rfe_original_data))
# Explore the data adapted with the less important features removed
print(head(rfe_fit$rfe_reduced_data))

[Package FeatureTerminatoR version 1.0.0 Index]