psi_iv_filter {creditmodel}R Documentation

Variable reduction based on Information Value & Population Stability Index filter

Description

psi_iv_filter is for selecting important and stable features using IV & PSI.

Usage

psi_iv_filter(
  dat,
  dat_test = NULL,
  target,
  x_list = NULL,
  breaks_list = NULL,
  pos_flag = NULL,
  ex_cols = NULL,
  occur_time = NULL,
  best = FALSE,
  equal_bins = TRUE,
  g = 10,
  sp_values = NULL,
  tree_control = list(p = 0.05, cp = 0.000001, xval = 5, maxdepth = 10),
  bins_control = list(bins_num = 10, bins_pct = 0.05, b_chi = 0.05, b_odds = 0.1, b_psi
    = 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.2, kc = 1),
  oot_pct = 0.7,
  psi_i = 0.1,
  iv_i = 0.01,
  cos_i = 0.7,
  vars_name = FALSE,
  note = TRUE,
  parallel = FALSE,
  save_data = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)

Arguments

dat

A data.frame with independent variables and target variable.

dat_test

A data.frame of test data. Default is NULL.

target

The name of target variable.

x_list

Names of independent variables.

breaks_list

A table containing a list of splitting points for each independent variable. Default is NULL.

pos_flag

The value of positive class of target variable, default: "1".

ex_cols

A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

occur_time

The name of the variable that represents the time at which each observation takes place.

best

Logical, if TRUE, merge initial breaks to get optimal breaks for binning.

equal_bins

Logical, if TRUE, equal sample size initial breaks generates.If FALSE , tree breaks generates using desison tree.

g

Integer, number of initial bins for equal_bins.

sp_values

A list of missing values.

tree_control

the list of tree parameters.

bins_control

the list of parameters.

oot_pct

Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7

psi_i

The maximum threshold of PSI. 0 <= psi_i <=1; 0.05 to 0.2 usually work. Default: 0.1

iv_i

The minimum threshold of IV. 0 < iv_i ; 0.01 to 0.1 usually work. Default: 0.01

cos_i

cos_similarity of posive rate of train and test. 0.7 to 0.9 usually work.Default: 0.5.

vars_name

Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE.

note

Logical, outputs info. Default is TRUE.

parallel

Logical, parallel computing. Default is FALSE.

save_data

Logical, save results in locally specified folder. Default is FALSE.

file_name

The name for periodically saved results files. Default is "Feature_importance_IV_PSI".

dir_path

The path for periodically saved results files. Default is tempdir().

...

Other parameters.

Value

A list with the following elements:

See Also

xgb_filter, gbm_filter, feature_selector

Examples

psi_iv_filter(dat= UCICreditCard[1:1000,c(2,4,8:9,26)],
             target = "default.payment.next.month",
             occur_time = "apply_date",
             parallel = FALSE)

[Package creditmodel version 1.3.0 Index]