others_class {scorecardModelUtils}R Documentation

Clubbing of classes of categorical variable with low population percentage into one class

Description

The function groups the classes of a categorical variable which have population percentage less than a threshold as "Low_pop_perc". The user can choose whether to club the missing class or keep it as separate class. The default setting is that missing classes are not treated separately.

Usage

others_class(base, target, column_name, threshold, char_missing = NA)

Arguments

base

input dataframe

target

column / field name for the target variable to be passed as string (must be 0/1 type)

column_name

column name or array of column names of the dataframe on which the operation is to be done

threshold

threshold population percentage below which the class is to be classified as others, to be provided as decimal/fraction

char_missing

(optional) imputed missing value for categorical variable if its to be kept separate (default value is NA)

Value

base

a dataframe after converting all low percentage classes into "Low_pop_perc" class

mapping_table

a dataframe with mapping between original classes which are now "Low_pop_perc" class (if any)

Author(s)

Arya Poddar <aryapoddar290990@gmail.com>

Examples

data <- iris[c(1:110),]
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
data$Species <- as.character(data$Species)
data_otherclass <- others_class(base = data,target = "Y",column_name = "Species",threshold = 0.15)

[Package scorecardModelUtils version 0.0.1.0 Index]