club_cat_class {scorecardModelUtils}R Documentation

Clubbing class of a categorical variable with low population percentage with another class of similar event rate

Description

The function groups classes of categorical variable, which have population percentage less than a threshold, with another class of similar event rate. If a class of exactly same event rate is not available, it is clubbed with the one having a higher event rate closest to it.

Usage

club_cat_class(base, target, variable, threshold, event = 1)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

variable

column name of categorical variable on which the operation is to be done, to be passed as string

threshold

threshold population percentage below which the class will be considered to be be clubbed with another class, to be provided as decimal/fraction

event

(optional) the event class, to be passed as 0 or 1 (default is 1)

Value

The function returns a dataframe after clubbing low percentage classes with another class of similar or closest but higher event rate.

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Kanishk Dogar <kanishkd4@gmail.com>

Examples

data <- iris[1:110,]
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
data_clubclass <- club_cat_class(base = data,target = "Y",variable = "Species",threshold = 0.2)

[Package scorecardModelUtils version 0.0.1.0 Index]