get_breaks_all {creditmodel} | R Documentation |
Generates Best Breaks for Binning
Description
get_breaks
is for generating optimal binning for numerical and nominal variables.
The get_breaks_all
is a simpler wrapper for get_breaks
.
Usage
get_breaks_all(
dat,
target = NULL,
x_list = NULL,
ex_cols = NULL,
pos_flag = NULL,
occur_time = NULL,
oot_pct = 0.7,
best = TRUE,
equal_bins = FALSE,
cut_bin = "equal_depth",
g = 10,
sp_values = NULL,
tree_control = list(p = 0.05, cp = 1e-06, xval = 5, maxdepth = 10),
bins_control = list(bins_num = 10, bins_pct = 0.05, b_chi = 0.05, b_odds = 0.1, b_psi
= 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.2, kc = 1),
parallel = FALSE,
note = FALSE,
save_data = FALSE,
file_name = NULL,
dir_path = tempdir(),
...
)
get_breaks(
dat,
x,
target = NULL,
pos_flag = NULL,
best = TRUE,
equal_bins = FALSE,
cut_bin = "equal_depth",
g = 10,
sp_values = NULL,
occur_time = NULL,
oot_pct = 0.7,
tree_control = NULL,
bins_control = NULL,
note = FALSE,
...
)
Arguments
dat |
A data frame with x and target. |
target |
The name of target variable. |
x_list |
A list of x variables. |
ex_cols |
A list of excluded variables. Default is NULL. |
pos_flag |
The value of positive class of target variable, default: "1". |
occur_time |
The name of the variable that represents the time at which each observation takes place. |
oot_pct |
Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7 |
best |
Logical, if TRUE, merge initial breaks to get optimal breaks for binning. |
equal_bins |
Logical, if TRUE, equal sample size initial breaks generates.If FALSE , tree breaks generates using desison tree. |
cut_bin |
A string, if equal_bins is TRUE, 'equal_depth' or 'equal_width', default is 'equal_depth'. |
g |
Integer, number of initial bins for equal_bins. |
sp_values |
A list of missing values. |
tree_control |
the list of tree parameters.
|
bins_control |
the list of parameters.
|
parallel |
Logical, parallel computing or not. Default is FALSE. |
note |
Logical.Outputs info.Default is TRUE. |
save_data |
Logical, save results in locally specified folder. Default is TRUE |
file_name |
File name that save results in locally specified folder. Default is "breaks_list". |
dir_path |
Path to save results. Default is "./variable" |
... |
Additional parameters. |
x |
The Name of an independent variable. |
Value
A table containing a list of splitting points for each independent variable.
See Also
get_tree_breaks
, cut_equal
, select_best_class
, select_best_breaks
Examples
#controls
tree_control = list(p = 0.02, cp = 0.000001, xval = 5, maxdepth = 10)
bins_control = list(bins_num = 10, bins_pct = 0.02, b_chi = 0.02, b_odds = 0.1,
b_psi = 0.05, b_or = 15, mono = 0.2, odds_psi = 0.1, kc = 5)
# get categrory variable breaks
b = get_breaks(dat = UCICreditCard[1:1000,], x = "MARRIAGE",
target = "default.payment.next.month",
occur_time = "apply_date",
sp_values = list(-1, "missing"),
tree_control = tree_control, bins_control = bins_control)
# get numeric variable breaks
b2 = get_breaks(dat = UCICreditCard[1:1000,], x = "PAY_2",
target = "default.payment.next.month",
occur_time = "apply_date",
sp_values = list(-1, "missing"),
tree_control = tree_control, bins_control = bins_control)
# get breaks of all predictive variables
b3 = get_breaks_all(dat = UCICreditCard[1:1000,], target = "default.payment.next.month",
x_list = c("MARRIAGE","PAY_2"),
occur_time = "apply_date", ex_cols = "ID",
sp_values = list(-1, "missing"),
tree_control = tree_control, bins_control = bins_control,
save_data = FALSE)