select_best_class {creditmodel} | R Documentation |
Generates Best Binning Breaks
Description
select_best_class
& select_best_breaks
are for merging initial breaks of variables using chi-square, odds-ratio,PSI,G/B index and so on.
The get_breaks
is a simpler wrapper for select_best_class
& select_best_class
.
Usage
select_best_class(
dat,
x,
target,
breaks = NULL,
occur_time = NULL,
oot_pct = 0.7,
pos_flag = NULL,
bins_control = NULL,
sp_values = NULL,
...
)
select_best_breaks(
dat,
x,
target,
breaks = NULL,
pos_flag = NULL,
sp_values = NULL,
occur_time = NULL,
oot_pct = 0.7,
bins_control = NULL,
...
)
Arguments
dat |
A data frame with x and target. |
x |
The name of variable to process. |
target |
The name of target variable. |
breaks |
Splitting points for an independent variable. Default is NULL. |
occur_time |
The name of the variable that represents the time at which each observation takes place. |
oot_pct |
The percentage of Actual and Expected set for PSI calculating. |
pos_flag |
The value of positive class of target variable, default: "1". |
bins_control |
the list of parameters.
|
sp_values |
A list of special value. |
... |
Other parameters. |
Details
The folloiwing is the list of Reference Principles
1.The increasing or decreasing trend of variables is consistent with the actual business experience.(The percent of Non-monotonic intervals of which are not head or tail is less than 0.35)
2.Maximum 10 intervals for a single variable.
3.Each interval should cover more than 2
4.Each interval needs at least 30 or 1
5.Combining the values of blank, missing or other special value into the same interval called missing.
6.The difference of Chi effect size between intervals should be at least 0.02 or more.
7.The difference of absolute odds ratio between intervals should be at least 0.1 or more.
8.The difference of positive rate between intervals should be at least 1/10 of the total positive rate.
9.The difference of G/B index between intervals should be at least 15 or more.
10.The PSI of each interval should be less than 0.1.
Value
A list of breaks for x.
See Also
get_tree_breaks
,
cut_equal
,
get_breaks
Examples
#equal sample size breaks
equ_breaks = cut_equal(dat = UCICreditCard[, "PAY_AMT2"], g = 10)
# select best bins
bins_control = list(bins_num = 10, bins_pct = 0.02, b_chi = 0.02,
b_odds = 0.1, b_psi = 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.1, kc = 1)
select_best_breaks(dat = UCICreditCard, x = "PAY_AMT2", breaks = equ_breaks,
target = "default.payment.next.month", occur_time = "apply_date",
sp_values = NULL, bins_control = bins_control)