cv_filter {scorecardModelUtils} | R Documentation |
Variable reduction based on Cramer's V filter
Description
The function returns a list of variables that can be dropped because of high correlation with another variable, based on Cramer's V and IV. If V1 and V2 have a Cramer's V value more than a user defined threshold, the variable with lower IV will be recommended to be dropped by this function. The variable which got dropped wont be considered for dropping any more variables.
Usage
cv_filter(cv_table, iv_table, threshold)
Arguments
cv_table |
dataframe of class cv_table with three columns - var_1, var_2, cv_value |
iv_table |
dataframe of class iv_table with two columns - Variable_name, iv |
threshold |
Cramers' V value above which one of the variable will be recommended to be dropped |
Value
An object of class "cv_filter" is a list containing the following components:
retain_var_list |
list of variables remaining post CV filter |
dropped_var_list |
list of variables that can be dropped based on CV filter |
dropped_var_tab |
CV correlation value for dropped variables as a dataframe |
threshold |
threshold CV value used as input parameter |
Author(s)
Arya Poddar <aryapoddar290990@gmail.com>
Examples
data <- iris
suppressWarnings(RNGversion('3.5.0'))
set.seed(11)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
cv_tab_list <- cv_table(data, c("Species", "Sepal.Length"))
cv_tab <- cv_tab_list$cv_val_tab
x <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
iv_table_list <- iv_table(base = data,target = "Y",num_var_name = x,cat_var_name = "Species")
iv_tab <- iv_table_list$iv_table
cv_filter_list <- cv_filter(cv_table = cv_tab,iv_table = iv_tab,threshold = 0.5)
cv_filter_list$retain_var_list
cv_filter_list$dropped_var_list
cv_filter_list$dropped_var_tab
cv_filter_list$threshold