others_class {scorecardModelUtils} | R Documentation |
Clubbing of classes of categorical variable with low population percentage into one class
Description
The function groups the classes of a categorical variable which have population percentage less than a threshold as "Low_pop_perc". The user can choose whether to club the missing class or keep it as separate class. The default setting is that missing classes are not treated separately.
Usage
others_class(base, target, column_name, threshold, char_missing = NA)
Arguments
base |
input dataframe |
target |
column / field name for the target variable to be passed as string (must be 0/1 type) |
column_name |
column name or array of column names of the dataframe on which the operation is to be done |
threshold |
threshold population percentage below which the class is to be classified as others, to be provided as decimal/fraction |
char_missing |
(optional) imputed missing value for categorical variable if its to be kept separate (default value is NA) |
Value
base |
a dataframe after converting all low percentage classes into "Low_pop_perc" class |
mapping_table |
a dataframe with mapping between original classes which are now "Low_pop_perc" class (if any) |
Author(s)
Arya Poddar <aryapoddar290990@gmail.com>
Examples
data <- iris[c(1:110),]
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
data$Species <- as.character(data$Species)
data_otherclass <- others_class(base = data,target = "Y",column_name = "Species",threshold = 0.15)