fast_high_cor_filter {creditmodel}R Documentation

high_cor_filter

Description

fast_high_cor_filter In a highly correlated variable group, select the variable with the highest IV. high_cor_filter In a highly correlated variable group, select the variable with the highest IV.

Usage

fast_high_cor_filter(
  dat,
  p = 0.95,
  x_list = NULL,
  com_list = NULL,
  ex_cols = NULL,
  save_data = FALSE,
  cor_class = TRUE,
  vars_name = TRUE,
  parallel = FALSE,
  note = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)

high_cor_filter(
  dat,
  com_list = NULL,
  x_list = NULL,
  ex_cols = NULL,
  onehot = TRUE,
  parallel = FALSE,
  p = 0.7,
  file_name = NULL,
  dir_path = tempdir(),
  save_data = FALSE,
  note = FALSE,
  ...
)

Arguments

dat

A data.frame with independent variables.

p

Threshold of correlation between features. Default is 0.95.

x_list

Names of independent variables.

com_list

A data.frame with important values of each variable. eg : IV_list

ex_cols

A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

save_data

Logical, save results in locally specified folder. Default is FALSE.

cor_class

Culculate catagery variables's correlation matrix. Default is FALSE.

vars_name

Logical, output a list of filtered variables or table with detailed compared value of each variable. Default is TRUE.

parallel

Logical, parallel computing. Default is FALSE.

note

Logical. Outputs info. Default is TRUE.

file_name

The name for periodically saved results files. Default is "Feature_selected_COR".

dir_path

The path for periodically saved results files. Default is "./variable".

...

Additional parameters.

onehot

one-hot-encoding independent variables.

Value

A list of selected variables.

See Also

get_correlation_group, high_cor_selector, char_cor_vars

Examples

# calculate iv for each variable.
iv_list = feature_selector(dat_train = UCICreditCard[1:1000,], dat_test = NULL,
target = "default.payment.next.month",
occur_time = "apply_date",
filter = c("IV"), cv_folds = 1, iv_cp = 0.01,
ex_cols = "ID$|date$|default.payment.next.month$",
save_data = FALSE, vars_name = FALSE)
fast_high_cor_filter(dat = UCICreditCard[1:1000,],
com_list = iv_list, save_data = FALSE,
ex_cols = "ID$|date$|default.payment.next.month$",
p = 0.9, cor_class = FALSE ,var_name = FALSE)

[Package creditmodel version 1.3.1 Index]