woe.binning.deploy {woeBinning}R Documentation

Deployment of Binning

Description

woe.binning.deploy applies the binning solution generated and saved via the woe.binning or woe.tree.binning function to (new) data.

Usage

woe.binning.deploy(df, binning, min.iv.total, add.woe.or.dum.var)

Arguments

df

Name of the data frame the binning solution - that was generated via the function woe.binning or woe.tree.binning - should be applied to. The variable names and types (numerical or factor) need to be identical to the ones used during the generation of the binning solution.

binning

Binning information generated from the woe.binning or woe.tree.binning function. Contains names of the input predictor variables and the corresponding binning, WOE and IV information, which is used to add a binned variable to a copy of the input data.

min.iv.total

If the IV total value of a binned variable falls below this limit (e.g. 0.1) it will not be added to the data. Just omit this parameter in case you would like to add all binned variables (default).

add.woe.or.dum.var

add.woe.or.dum.var=“woe” adds an additional variable with WOE scores and =“dum” additional dummy variables for each (aggregated) level of the binned variable. In case of dummy variables make sure that you have set an appropriate abbrev.fact.levels parameter in the woe.binning or woe.tree.binning function to avoid too long variable names. In principle, only alphanumeric characters and dots (.) will be used for variable names. Just omit this parameter in case you don't need additional variables.

General Procedure

woe.binning.deploy applies the binning information that was generated from the woe.binning or woe.tree.binning function to a data frame. In this data frame the names of the variables to be binned need to be identical to the ones used with the woe.binning or woe.tree.binning function. For each variable a binned version will be added. Optionally a variable with associated weight of evidence (WOE) values or corresponding dummy variables (one dummy variable for each final bin) are provided.

Handling of Missing Data

In case NAs already occurred during the woe.binning or woe.tree.binning binning process the code ‘Missing’ is displayed and a corresponding WOE value can be computed. In case NAs only occur in the deployment scenario ‘Missing’ is displayed for numeric variables and ‘unknown’ for factors; and the corresponding WOE values will be NAs then, as well.

Handling of Unknown Factor Levels

For factor levels that have not been provided in generating the binning solution via the woe.binning or woe.tree.binning function a new factor level ‘unknown’ is displayed and the corresponding WOE value will be NA.

Examples

# Load German credit data and create a subset
data(germancredit)
df <- germancredit[, c('creditability', 'credit.amount', 'duration.in.month',
                  'savings.account.and.bonds', 'purpose')]

# Bin all variables of the data frame (apart from the target variable)
# with default parameter settings
binning <- woe.binning(df, 'creditability', df)
					  
# Deploy the binning solution to the data frame
# (add all binned variables and corresponding WOE variables)
df.with.binned.vars.added <- woe.binning.deploy(df, binning,
                                               add.woe.or.dum.var='woe')		
					  
# Deploy the binning solution to the data frame
# (add binned variables with IV>=0.1 and corresponding dummy variables)
df.with.binned.vars.added <- woe.binning.deploy(df, binning,
                                               min.iv.total=0.1,
                                               add.woe.or.dum.var='dum')		


[Package woeBinning version 0.1.6 Index]