psi_iv_filter {creditmodel} | R Documentation |
Variable reduction based on Information Value & Population Stability Index filter
Description
psi_iv_filter
is for selecting important and stable features using IV & PSI.
Usage
psi_iv_filter(
dat,
dat_test = NULL,
target,
x_list = NULL,
breaks_list = NULL,
pos_flag = NULL,
ex_cols = NULL,
occur_time = NULL,
best = FALSE,
equal_bins = TRUE,
g = 10,
sp_values = NULL,
tree_control = list(p = 0.05, cp = 1e-06, xval = 5, maxdepth = 10),
bins_control = list(bins_num = 10, bins_pct = 0.05, b_chi = 0.05, b_odds = 0.1, b_psi
= 0.05, b_or = 0.15, mono = 0.3, odds_psi = 0.2, kc = 1),
oot_pct = 0.7,
psi_i = 0.1,
iv_i = 0.01,
cos_i = 0.7,
vars_name = FALSE,
note = TRUE,
parallel = FALSE,
save_data = FALSE,
file_name = NULL,
dir_path = tempdir(),
...
)
Arguments
dat |
A data.frame with independent variables and target variable. |
dat_test |
A data.frame of test data. Default is NULL. |
target |
The name of target variable. |
x_list |
Names of independent variables. |
breaks_list |
A table containing a list of splitting points for each independent variable. Default is NULL. |
pos_flag |
The value of positive class of target variable, default: "1". |
ex_cols |
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |
occur_time |
The name of the variable that represents the time at which each observation takes place. |
best |
Logical, if TRUE, merge initial breaks to get optimal breaks for binning. |
equal_bins |
Logical, if TRUE, equal sample size initial breaks generates.If FALSE , tree breaks generates using desison tree. |
g |
Integer, number of initial bins for equal_bins. |
sp_values |
A list of missing values. |
tree_control |
the list of tree parameters. |
bins_control |
the list of parameters. |
oot_pct |
Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7 |
psi_i |
The maximum threshold of PSI. 0 <= psi_i <=1; 0.05 to 0.2 usually work. Default: 0.1 |
iv_i |
The minimum threshold of IV. 0 < iv_i ; 0.01 to 0.1 usually work. Default: 0.01 |
cos_i |
cos_similarity of posive rate of train and test. 0.7 to 0.9 usually work.Default: 0.5. |
vars_name |
Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE. |
note |
Logical, outputs info. Default is TRUE. |
parallel |
Logical, parallel computing. Default is FALSE. |
save_data |
Logical, save results in locally specified folder. Default is FALSE. |
file_name |
The name for periodically saved results files. Default is "Feature_importance_IV_PSI". |
dir_path |
The path for periodically saved results files. Default is tempdir(). |
... |
Other parameters. |
Value
A list with the following elements:
-
Feature
Selected variables. -
IV
IV of variables. -
PSI
PSI of variables. -
COS
cos_similarity of posive rate of train and test.
See Also
xgb_filter
, gbm_filter
, feature_selector
Examples
psi_iv_filter(dat= UCICreditCard[1:1000,c(2,4,8:9,26)],
target = "default.payment.next.month",
occur_time = "apply_date",
parallel = FALSE)