cat_new_class {scorecardModelUtils}R Documentation

Clubbing class of categorical variables with low population percentage with another class of similar event rate

Description

The function groups classes of categorical variables, which have population percentage less than a threshold, with another class of similar event rate. If a class of exactly same event rate is not available, it is clubbed with the one having a higher event rate closest to it.

Usage

cat_new_class(base, target, cat_var_name, threshold, event = 1)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

cat_var_name

column name or array of column names of categorical variable on which the operation is to be done, to be passed as string

threshold

threshold population percentage below which the class will be considered to be be clubbed with another class, to be provided as decimal/fraction

event

(optional) the event class, to be passed as 0 or 1 (default is 1)

Value

The function returns an object of class "cat_new_class" which is a list containing the following components:

base_new

a dataframe after clubbing low percentage classes with another class of similar or closest but higher event rate

cat_class_new

a dataframe with mapping between original classes and new clubbed classes (if any)

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Kanishk Dogar <Kanishkd4@gmail.com>

Examples

data <- iris[1:110,]
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
data_newclass <- cat_new_class(base = data,target = "Y",cat_var_name = "Species",threshold = 0.1)

[Package scorecardModelUtils version 0.0.1.0 Index]