mutlicol_terminator {FeatureTerminatoR} | R Documentation |
Multicollinearity TerminatoR - Feature Selection to remove highly correlated values
Description
This function looks at highly correlated features and allows for a correlation cutoff to be set. Outputs from this function allow for correlations and covariance matrices to be created, alongside visuals and the ability to remove highly correlated features from your statistic pipeline.
Usage
mutlicol_terminator(df, x_cols, y_cols, alter_df = TRUE, cor_sig = 0.9)
Arguments
df |
The data frame to pass with the x and y variables |
x_cols |
The independent variables we want to analyse for multicollinearity |
y_cols |
The dependent variables(s) in your predictive model |
alter_df |
Default=TRUE - Determines whether the underlying features are removed from the data frame, with TRUE being the default. |
cor_sig |
Default=0.9 - A correlation significance for the cut-off in inter-feature correlation |
Value
A list containing the outputs highlighted hereunder:
det
"rfe_model_fit_results" a list of the model fit results. Including the optimal features
"rfe_reduced_features" a data.frame object with the reduced variables and data
"rfe_original_data" a data.frame object with the original data passed for manual exclusion based on fit outputs
"rfe_reduced_data"output of setting the alter_df=TRUE will remove the features / IVs from the data.frame
Examples
library(caret)
library(FeatureTerminatoR)
library(tibble)
library(dplyr)
df <- iris
mc_fit <- mutlicol_terminator(df, 1:4,5, cor_sig = 0.90, alter_df = TRUE)
#View the correlation matrix
mc_fit$corr_matrix
#View the reduced data
head(mc_fit$feature_removed_df,10)