smbinning {smbinning}R Documentation

Optimal Binning for Scoring Modeling

Description

Optimal Binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling. This process, also known as supervised discretization, utilizes Recursive Partitioning to categorize the numeric characteristic.
The especific algorithm is Conditional Inference Trees which initially excludes missing values (NA) to compute the cutpoints, adding them back later in the process for the calculation of the Information Value.

Usage

smbinning(df, y, x, p = 0.05)

Arguments

df

A data frame.

y

Binary response variable (0,1). Integer (int) is required. Name of y must not have a dot. Name "default" is not allowed.

x

Continuous characteristic. At least 5 different values. Value Inf is not allowed. Name of x must not have a dot.

p

Percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

Value

The command smbinning generates and object containing the necessary info and utilities for binning. The user should save the output result so it can be used with smbinning.plot, smbinning.sql, and smbinning.gen.

Examples

# Load library and its dataset
library(smbinning) # Load package and its data

# Example: Optimal binning
result=smbinning(df=smbsimdf1,y="fgood",x="cbs1") # Run and save result
result$ivtable # Tabulation and Information Value
result$iv # Information value
result$bands # Bins or bands
result$ctree # Decision tree

[Package smbinning version 0.9 Index]