R: high_cor

fast_high_cor_filter {creditmodel}

R Documentation

high_cor_filter

Description

fast_high_cor_filter In a highly correlated variable group, select the variable with the highest IV. high_cor_filter In a highly correlated variable group, select the variable with the highest IV.

Usage

fast_high_cor_filter(
  dat,
  p = 0.95,
  x_list = NULL,
  com_list = NULL,
  ex_cols = NULL,
  save_data = FALSE,
  cor_class = TRUE,
  vars_name = TRUE,
  parallel = FALSE,
  note = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)

high_cor_filter(
  dat,
  com_list = NULL,
  x_list = NULL,
  ex_cols = NULL,
  onehot = TRUE,
  parallel = FALSE,
  p = 0.7,
  file_name = NULL,
  dir_path = tempdir(),
  save_data = FALSE,
  note = FALSE,
  ...
)

Arguments

`dat`	A data.frame with independent variables.
`p`	Threshold of correlation between features. Default is 0.95.
`x_list`	Names of independent variables.
`com_list`	A data.frame with important values of each variable. eg : IV_list
`ex_cols`	A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.
`save_data`	Logical, save results in locally specified folder. Default is FALSE.
`cor_class`	Culculate catagery variables's correlation matrix. Default is FALSE.
`vars_name`	Logical, output a list of filtered variables or table with detailed compared value of each variable. Default is TRUE.
`parallel`	Logical, parallel computing. Default is FALSE.
`note`	Logical. Outputs info. Default is TRUE.
`file_name`	The name for periodically saved results files. Default is "Feature_selected_COR".
`dir_path`	The path for periodically saved results files. Default is "./variable".
`...`	Additional parameters.
`onehot`	one-hot-encoding independent variables.

Value

A list of selected variables.

Examples

# calculate iv for each variable.
iv_list = feature_selector(dat_train = UCICreditCard[1:1000,], dat_test = NULL,
target = "default.payment.next.month",
occur_time = "apply_date",
filter = c("IV"), cv_folds = 1, iv_cp = 0.01,
ex_cols = "ID$|date$|default.payment.next.month$",
save_data = FALSE, vars_name = FALSE)
fast_high_cor_filter(dat = UCICreditCard[1:1000,],
com_list = iv_list, save_data = FALSE,
ex_cols = "ID$|date$|default.payment.next.month$",
p = 0.9, cor_class = FALSE ,var_name = FALSE)