cat_new_class {scorecardModelUtils} | R Documentation |
Clubbing class of categorical variables with low population percentage with another class of similar event rate
Description
The function groups classes of categorical variables, which have population percentage less than a threshold, with another class of similar event rate. If a class of exactly same event rate is not available, it is clubbed with the one having a higher event rate closest to it.
Usage
cat_new_class(base, target, cat_var_name, threshold, event = 1)
Arguments
base |
input dataframe |
target |
column / field name for the target variable to be passed as string (must be 0/1 type) |
cat_var_name |
column name or array of column names of categorical variable on which the operation is to be done, to be passed as string |
threshold |
threshold population percentage below which the class will be considered to be be clubbed with another class, to be provided as decimal/fraction |
event |
(optional) the event class, to be passed as 0 or 1 (default is 1) |
Value
The function returns an object of class "cat_new_class" which is a list containing the following components:
base_new |
a dataframe after clubbing low percentage classes with another class of similar or closest but higher event rate |
cat_class_new |
a dataframe with mapping between original classes and new clubbed classes (if any) |
Author(s)
Arya Poddar <aryapoddar290990@gmail.com>
Kanishk Dogar <Kanishkd4@gmail.com>
Examples
data <- iris[1:110,]
data$Species <- as.character(data$Species)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
data_newclass <- cat_new_class(base = data,target = "Y",cat_var_name = "Species",threshold = 0.1)