lasso_filter {creditmodel}R Documentation

Variable selection by LASSO

Description

lasso_filter filter variables by lasso.

Usage

lasso_filter(
  dat_train,
  dat_test = NULL,
  target = NULL,
  x_list = NULL,
  pos_flag = NULL,
  ex_cols = NULL,
  sim_sign = "negtive",
  best_lambda = "lambda.auc",
  save_data = FALSE,
  plot.it = TRUE,
  seed = 46,
  file_name = NULL,
  dir_path = tempdir(),
  note = FALSE
)

Arguments

dat_train

A data.frame with independent variables and target variable.

dat_test

A data.frame of test data. Default is NULL.

target

The name of target variable.

x_list

Names of independent variables.

pos_flag

The value of positive class of target variable, default: "1".

ex_cols

A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

sim_sign

The coefficients of all variables should be all negetive or positive, after turning to woe. Default is "negetive" for pos_flag is "1".

best_lambda

Metheds of best lambda stardards using to filter variables by LASSO. There are 3 methods: ("lambda.auc", "lambda.ks", "lambda.sim_sign") . Default is "lambda.auc".

save_data

Logical, save results in locally specified folder. Default is FALSE

plot.it

Logical, shrinkage plot. Default is TRUE.

seed

Random number seed. Default is 46.

file_name

The name for periodically saved results files. Default is "Feature_selected_LASSO".

dir_path

The path for periodically saved results files. Default is "./variable".

note

Logical, outputs info. Default is FALSE.

Value

A list of filtered x variables by lasso.

Examples

 sub = cv_split(UCICreditCard, k = 40)[[1]]
 dat = UCICreditCard[sub,]
 dat = re_name(dat, "default.payment.next.month", "target")
 dat_train = data_cleansing(dat, target = "target", obs_id = "ID", occur_time = "apply_date",
  miss_values = list("", -1))
 dat_train = process_nas(dat_train)
 #get breaks of all predictive variables
 x_list = c("PAY_0", "LIMIT_BAL", "PAY_AMT5", "EDUCATION", "PAY_3", "PAY_2")
 breaks_list = get_breaks_all(dat = dat_train, target = "target",
                                x_list = x_list, occur_time = "apply_date", ex_cols = "ID",
  save_data = FALSE, note = FALSE)
 #woe transform
 train_woe = woe_trans_all(dat = dat_train,x_list = x_list,
                            target = "target",
                            breaks_list = breaks_list,
                            woe_name = FALSE)
 lasso_filter(dat_train = train_woe, 
         target = "target", x_list = x_list,
       save_data = FALSE, plot.it = FALSE)

[Package creditmodel version 1.3.1 Index]