cat.bin {PDtoolkit} | R Documentation |
Categorical risk factor binning
Description
cat.bin
implements three-stage binning procedure for categorical risk factors.
The first stage is possible correction for minimum percentage of observations.
The second stage is possible correction for target rate (default rate), while the third one is
possible correction for maximum number of bins. Last stage implements procedure known as
adjacent pooling algorithm (APA) which aims to minimize information loss while iterative merging of the bins.
Usage
cat.bin(
x,
y,
sc = NA,
sc.merge = "none",
min.pct.obs = 0.05,
min.avg.rate = 0.01,
max.groups = NA,
force.trend = "modalities"
)
Arguments
x |
Categorical risk factor. |
y |
Numeric target vector (binary). |
sc |
Special case elements. Default value is |
sc.merge |
Define how special cases will be treated. Available options are: |
min.pct.obs |
Minimum percentage of observations per bin. Default is 0.05 or minimum 30 observations. |
min.avg.rate |
Minimum default rate. Default is 0.01 or minimum 1 bad case for |
max.groups |
Maximum number of bins (groups) allowed for analyzed risk factor. If in the first two stages
number of bins is less or equal to selected |
force.trend |
Defines how initial summary table will be ordered. Possible options are: |
Value
The command cat.bin
generates a list of two objects. The first object, data frame summary.tbl
presents a summary table of final binning, while x.trans
is a vector of new grouping values.
References
Anderson, R. (2007). The credit scoring toolkit: theory and practice for retail credit risk management and decision automation, Oxford University Press
Examples
suppressMessages(library(PDtoolkit))
data(loans)
#prepare risk factor Purpose for the analysis
loans$Purpose <- ifelse(nchar(loans$Purpose) == 2, loans$Purpose, paste0("0", loans$Purpose))
#artificially add missing values in order to show functions' features
loans$Purpose[1:6] <- NA
#run binning procedure
res <- cat.bin(x = loans$Purpose,
y = loans$Creditability,
sc = NA,
sc.merge = "none",
min.pct.obs = 0.05,
min.avg.rate = 0.05,
max.groups = NA,
force.trend = "modalities")
res[[1]]
#check new risk factor against the original
table(loans$Purpose, res[[2]], useNA = "always")
#repeat the same process with setting max.groups to 4 and force.trend to dr
res <- cat.bin(x = loans$Purpose,
y = loans$Creditability,
sc = NA,
sc.merge = "none",
min.pct.obs = 0.05,
min.avg.rate = 0.05,
max.groups = 4,
force.trend = "dr")
res[[1]]
#check new risk factor against the original
table(loans$Purpose, res[[2]], useNA = "always")
#example of shrinking number of groups for numeric risk factor
#copy exisitng numeric risk factor to new called maturity
loans$maturity <- loans$"Duration of Credit (month)"
#artificially add missing values in order to show functions' features
loans$maturity[1:10] <- NA
#categorize maturity with MAPA algorithim from monobin package
loans$maturity.bin <- cum.bin(x = loans$maturity,
y = loans$Creditability, g = 50)[[2]]
table(loans$maturity.bin)
#run binning procedure to decrease number of bins from the previous step
res <- cat.bin(x = loans$maturity.bin,
y = loans$Creditability,
sc = "SC",
sc.merge = "closest",
min.pct.obs = 0.05,
min.avg.rate = 0.01,
max.groups = 5,
force.trend = "modalities")
res[[1]]
#check new risk factor against the original
table(loans$maturity.bin, res[[2]], useNA = "always")