select_best_class {creditmodel}  R Documentation 
select_best_class
& select_best_breaks
are for merging initial breaks of variables using chisquare, oddsratio,PSI,G/B index and so on.
The get_breaks
is a simpler wrapper for select_best_class
& select_best_class
.
select_best_class(
dat,
x,
target,
breaks = NULL,
occur_time = NULL,
oot_pct = 0.7,
pos_flag = NULL,
bins_control = NULL,
sp_values = NULL,
...
)
select_best_breaks(
dat,
x,
target,
breaks = NULL,
pos_flag = NULL,
sp_values = NULL,
occur_time = NULL,
oot_pct = 0.7,
bins_control = NULL,
...
)
dat 
A data frame with x and target. 
x 
The name of variable to process. 
target 
The name of target variable. 
breaks 
Splitting points for an independent variable. Default is NULL. 
occur_time 
The name of the variable that represents the time at which each observation takes place. 
oot_pct 
The percentage of Actual and Expected set for PSI calculating. 
pos_flag 
The value of positive class of target variable, default: "1". 
bins_control 
the list of parameters.

sp_values 
A list of special value. 
... 
Other parameters. 
The folloiwing is the list of Reference Principles
1.The increasing or decreasing trend of variables is consistent with the actual business experience.(The percent of Nonmonotonic intervals of which are not head or tail is less than 0.35)
2.Maximum 10 intervals for a single variable.
3.Each interval should cover more than 2
4.Each interval needs at least 30 or 1
5.Combining the values of blank, missing or other special value into the same interval called missing.
6.The difference of Chi effect size between intervals should be at least 0.02 or more.
7.The difference of absolute odds ratio between intervals should be at least 0.1 or more.
8.The difference of positive rate between intervals should be at least 1/10 of the total positive rate.
9.The difference of G/B index between intervals should be at least 15 or more.
10.The PSI of each interval should be less than 0.1.
A list of breaks for x.
get_tree_breaks
,
cut_equal
,
get_breaks
#equal sample size breaks
equ_breaks = cut_equal(dat = UCICreditCard[, "PAY_AMT2"], g = 10)
# select best bins
bins_control = list(bins_num = 10, bins_pct = 0.02, b_chi = 0.02,
b_odds = 0.1, b_psi = 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.1, kc = 1)
select_best_breaks(dat = UCICreditCard, x = "PAY_AMT2", breaks = equ_breaks,
target = "default.payment.next.month", occur_time = "apply_date",
sp_values = NULL, bins_control = bins_control)