smbinning {smbinning} | R Documentation |
Optimal Binning for Scoring Modeling
Description
Optimal Binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.
This process, also known as supervised discretization,
utilizes Recursive Partitioning to categorize
the numeric characteristic.
The especific algorithm is Conditional Inference Trees
which initially excludes missing values (NA
) to compute the cutpoints, adding them back later in the
process for the calculation of the Information Value.
Usage
smbinning(df, y, x, p = 0.05)
Arguments
df |
A data frame. |
y |
Binary response variable (0,1). Integer ( |
x |
Continuous characteristic. At least 5 different values. Value |
p |
Percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%). |
Value
The command smbinning
generates and object containing the necessary info and utilities for binning.
The user should save the output result so it can be used
with smbinning.plot
, smbinning.sql
, and smbinning.gen
.
Examples
# Load library and its dataset
library(smbinning) # Load package and its data
# Example: Optimal binning
result=smbinning(df=smbsimdf1,y="fgood",x="cbs1") # Run and save result
result$ivtable # Tabulation and Information Value
result$iv # Information value
result$bands # Bins or bands
result$ctree # Decision tree