fast_high_cor_filter {creditmodel} | R Documentation |
high_cor_filter
Description
fast_high_cor_filter
In a highly correlated variable group, select the variable with the highest IV.
high_cor_filter
In a highly correlated variable group, select the variable with the highest IV.
Usage
fast_high_cor_filter(
dat,
p = 0.95,
x_list = NULL,
com_list = NULL,
ex_cols = NULL,
save_data = FALSE,
cor_class = TRUE,
vars_name = TRUE,
parallel = FALSE,
note = FALSE,
file_name = NULL,
dir_path = tempdir(),
...
)
high_cor_filter(
dat,
com_list = NULL,
x_list = NULL,
ex_cols = NULL,
onehot = TRUE,
parallel = FALSE,
p = 0.7,
file_name = NULL,
dir_path = tempdir(),
save_data = FALSE,
note = FALSE,
...
)
Arguments
dat |
A data.frame with independent variables. |
p |
Threshold of correlation between features. Default is 0.95. |
x_list |
Names of independent variables. |
com_list |
A data.frame with important values of each variable. eg : IV_list |
ex_cols |
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |
save_data |
Logical, save results in locally specified folder. Default is FALSE. |
cor_class |
Culculate catagery variables's correlation matrix. Default is FALSE. |
vars_name |
Logical, output a list of filtered variables or table with detailed compared value of each variable. Default is TRUE. |
parallel |
Logical, parallel computing. Default is FALSE. |
note |
Logical. Outputs info. Default is TRUE. |
file_name |
The name for periodically saved results files. Default is "Feature_selected_COR". |
dir_path |
The path for periodically saved results files. Default is "./variable". |
... |
Additional parameters. |
onehot |
one-hot-encoding independent variables. |
Value
A list of selected variables.
See Also
get_correlation_group
, high_cor_selector
, char_cor_vars
Examples
# calculate iv for each variable.
iv_list = feature_selector(dat_train = UCICreditCard[1:1000,], dat_test = NULL,
target = "default.payment.next.month",
occur_time = "apply_date",
filter = c("IV"), cv_folds = 1, iv_cp = 0.01,
ex_cols = "ID$|date$|default.payment.next.month$",
save_data = FALSE, vars_name = FALSE)
fast_high_cor_filter(dat = UCICreditCard[1:1000,],
com_list = iv_list, save_data = FALSE,
ex_cols = "ID$|date$|default.payment.next.month$",
p = 0.9, cor_class = FALSE ,var_name = FALSE)