get_breaks_all {creditmodel}R Documentation

Generates Best Breaks for Binning


get_breaks is for generating optimal binning for numerical and nominal variables. The get_breaks_all is a simpler wrapper for get_breaks.


  target = NULL,
  x_list = NULL,
  ex_cols = NULL,
  pos_flag = NULL,
  occur_time = NULL,
  oot_pct = 0.7,
  best = TRUE,
  equal_bins = FALSE,
  cut_bin = "equal_depth",
  g = 10,
  sp_values = NULL,
  tree_control = list(p = 0.05, cp = 1e-06, xval = 5, maxdepth = 10),
  bins_control = list(bins_num = 10, bins_pct = 0.05, b_chi = 0.05, b_odds = 0.1, b_psi
    = 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.2, kc = 1),
  parallel = FALSE,
  note = FALSE,
  save_data = FALSE,
  file_name = NULL,
  dir_path = tempdir(),

  target = NULL,
  pos_flag = NULL,
  best = TRUE,
  equal_bins = FALSE,
  cut_bin = "equal_depth",
  g = 10,
  sp_values = NULL,
  occur_time = NULL,
  oot_pct = 0.7,
  tree_control = NULL,
  bins_control = NULL,
  note = FALSE,



A data frame with x and target.


The name of target variable.


A list of x variables.


A list of excluded variables. Default is NULL.


The value of positive class of target variable, default: "1".


The name of the variable that represents the time at which each observation takes place.


Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7


Logical, if TRUE, merge initial breaks to get optimal breaks for binning.


Logical, if TRUE, equal sample size initial breaks generates.If FALSE , tree breaks generates using desison tree.


A string, if equal_bins is TRUE, 'equal_depth' or 'equal_width', default is 'equal_depth'.


Integer, number of initial bins for equal_bins.


A list of missing values.


the list of tree parameters.

  • p the minimum percent of observations in any terminal <leaf> node. 0 < p< 1; 0.01 to 0.1 usually work.

  • cp complexity parameter. the larger, the more conservative the algorithm will be. 0 < cp< 1 ; 0.0001 to 0.0000001 usually work.

  • xval number of cross-validations.Default: 5

  • max_depth maximum depth of a tree. Default: 10


the list of parameters.

  • bins_num The maximum number of bins. 5 to 10 usually work. Default: 10

  • bins_pct The minimum percent of observations in any bins. 0 < bins_pct < 1 , 0.01 to 0.1 usually work. Default: 0.02

  • b_chi The minimum threshold of chi-square merge. 0 < b_chi< 1; 0.01 to 0.1 usually work. Default: 0.02

  • b_odds The minimum threshold of odds merge. 0 < b_odds < 1; 0.05 to 0.2 usually work. Default: 0.1

  • b_psi The maximum threshold of PSI in any bins. 0 < b_psi < 1 ; 0 to 0.1 usually work. Default: 0.05

  • b_or The maximum threshold of G/B index in any bins. 0 < b_or < 1 ; 0.05 to 0.3 usually work. Default: 0.15

  • odds_psi The maximum threshold of Training and Testing G/B index PSI in any bins. 0 < odds_psi < 1 ; 0.01 to 0.3 usually work. Default: 0.1

  • mono Monotonicity of all bins, the larger, the more nonmonotonic the bins will be. 0 < mono < 0.5 ; 0.2 to 0.4 usually work. Default: 0.2

  • kc number of cross-validations. 1 to 5 usually work. Default: 1


Logical, parallel computing or not. Default is FALSE.


Logical.Outputs info.Default is TRUE.


Logical, save results in locally specified folder. Default is TRUE


File name that save results in locally specified folder. Default is "breaks_list".


Path to save results. Default is "./variable"


Additional parameters.


The Name of an independent variable.


A table containing a list of splitting points for each independent variable.

See Also

get_tree_breaks, cut_equal, select_best_class, select_best_breaks


tree_control = list(p = 0.02, cp = 0.000001, xval = 5, maxdepth = 10)
bins_control = list(bins_num = 10, bins_pct = 0.02, b_chi = 0.02, b_odds = 0.1,
                   b_psi = 0.05, b_or = 15, mono = 0.2, odds_psi = 0.1, kc = 5)
# get categrory variable breaks
b =  get_breaks(dat = UCICreditCard[1:1000,], x = "MARRIAGE",
                target = "",
                occur_time = "apply_date",
                sp_values = list(-1, "missing"),
                tree_control = tree_control, bins_control = bins_control)
# get numeric variable breaks
b2 =  get_breaks(dat = UCICreditCard[1:1000,], x = "PAY_2",
                 target = "",
                 occur_time = "apply_date",
                 sp_values = list(-1, "missing"),
                 tree_control = tree_control, bins_control = bins_control)
# get breaks of all predictive variables
b3 =  get_breaks_all(dat = UCICreditCard[1:1000,], target = "",
                     x_list = c("MARRIAGE","PAY_2"),
                     occur_time = "apply_date", ex_cols = "ID",
                     sp_values = list(-1, "missing"),
                    tree_control = tree_control, bins_control = bins_control,
                     save_data = FALSE)

[Package creditmodel version 1.3.1 Index]