select_best_class {creditmodel}R Documentation

Generates Best Binning Breaks

Description

select_best_class & select_best_breaks are for merging initial breaks of variables using chi-square, odds-ratio,PSI,G/B index and so on. The get_breaks is a simpler wrapper for select_best_class & select_best_class.

Usage

select_best_class(
  dat,
  x,
  target,
  breaks = NULL,
  occur_time = NULL,
  oot_pct = 0.7,
  pos_flag = NULL,
  bins_control = NULL,
  sp_values = NULL,
  ...
)

select_best_breaks(
  dat,
  x,
  target,
  breaks = NULL,
  pos_flag = NULL,
  sp_values = NULL,
  occur_time = NULL,
  oot_pct = 0.7,
  bins_control = NULL,
  ...
)

Arguments

dat

A data frame with x and target.

x

The name of variable to process.

target

The name of target variable.

breaks

Splitting points for an independent variable. Default is NULL.

occur_time

The name of the variable that represents the time at which each observation takes place.

oot_pct

The percentage of Actual and Expected set for PSI calculating.

pos_flag

The value of positive class of target variable, default: "1".

bins_control

the list of parameters.

  • bins_num The maximum number of bins. 5 to 10 usually work. Default: 10

  • bins_pct The minimum percent of observations in any bins. 0 < bins_pct < 1 , 0.01 to 0.1 usually work. Default: 0.02.

  • b_chi The minimum threshold of chi-square merge. 0 < b_chi< 1; 0.01 to 0.1 usually work. Default: 0.02.

  • b_odds The minimum threshold of odds merge. 0 < b_odds < 1; 0.05 to 0.2 usually work. Default: 0.1.

  • b_psi The maximum threshold of PSI in any bins. 0 < b_psi < 1 ; 0 to 0.1 usually work. Default: 0.05.

  • b_or The maximum threshold of G/B index in any bins. 0 < b_or < 1 ; 0.05 to 0.3 usually work. Default: 0.15.

  • odds_psi The maximum threshold of Training and Testing G/B index PSI in any bins. 0 < odds_psi < 1 ; 0.01 to 0.3 usually work. Default: 0.1.

  • mono Monotonicity of all bins, the larger, the more nonmonotonic the bins will be. 0 < mono < 0.5 ; 0.2 to 0.4 usually work. Default: 0.2.

  • kc number of cross-validations. 1 to 5 usually work. Default: 1.

sp_values

A list of special value.

...

Other parameters.

Details

The folloiwing is the list of Reference Principles

Value

A list of breaks for x.

See Also

get_tree_breaks, cut_equal, get_breaks

Examples

#equal sample size breaks
equ_breaks = cut_equal(dat = UCICreditCard[, "PAY_AMT2"], g = 10)

# select best bins
bins_control = list(bins_num = 10, bins_pct = 0.02, b_chi = 0.02,
b_odds = 0.1, b_psi = 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.1, kc = 1)
select_best_breaks(dat = UCICreditCard, x = "PAY_AMT2", breaks = equ_breaks,
target = "default.payment.next.month", occur_time = "apply_date",
sp_values = NULL, bins_control = bins_control)

[Package creditmodel version 1.3.0 Index]