cv_filter {scorecardModelUtils}R Documentation

Variable reduction based on Cramer's V filter

Description

The function returns a list of variables that can be dropped because of high correlation with another variable, based on Cramer's V and IV. If V1 and V2 have a Cramer's V value more than a user defined threshold, the variable with lower IV will be recommended to be dropped by this function. The variable which got dropped wont be considered for dropping any more variables.

Usage

cv_filter(cv_table, iv_table, threshold)

Arguments

cv_table

dataframe of class cv_table with three columns - var_1, var_2, cv_value

iv_table

dataframe of class iv_table with two columns - Variable_name, iv

threshold

Cramers' V value above which one of the variable will be recommended to be dropped

Value

An object of class "cv_filter" is a list containing the following components:

retain_var_list

list of variables remaining post CV filter

dropped_var_list

list of variables that can be dropped based on CV filter

dropped_var_tab

CV correlation value for dropped variables as a dataframe

threshold

threshold CV value used as input parameter

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Examples

data <- iris
suppressWarnings(RNGversion('3.5.0'))
set.seed(11)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
cv_tab_list <- cv_table(data, c("Species", "Sepal.Length"))
cv_tab <- cv_tab_list$cv_val_tab
x <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
iv_table_list <- iv_table(base = data,target = "Y",num_var_name = x,cat_var_name = "Species")
iv_tab <- iv_table_list$iv_table
cv_filter_list <- cv_filter(cv_table = cv_tab,iv_table = iv_tab,threshold = 0.5)
cv_filter_list$retain_var_list
cv_filter_list$dropped_var_list
cv_filter_list$dropped_var_tab
cv_filter_list$threshold

[Package scorecardModelUtils version 0.0.1.0 Index]