xgb_filter {creditmodel} | R Documentation |
Select Features using XGB
Description
xgb_filter
is for selecting important features using xgboost.
Usage
xgb_filter(
dat_train,
dat_test = NULL,
target = NULL,
pos_flag = NULL,
x_list = NULL,
occur_time = NULL,
ex_cols = NULL,
xgb_params = list(nrounds = 100, max_depth = 6, eta = 0.1, min_child_weight = 1,
subsample = 1, colsample_bytree = 1, gamma = 0, scale_pos_weight = 1,
early_stopping_rounds = 10, objective = "binary:logistic"),
f_eval = "auc",
cv_folds = 1,
cp = NULL,
seed = 46,
vars_name = TRUE,
note = TRUE,
save_data = FALSE,
file_name = NULL,
dir_path = tempdir(),
...
)
Arguments
dat_train |
A data.frame with independent variables and target variable. |
dat_test |
A data.frame of test data. Default is NULL. |
target |
The name of target variable. |
pos_flag |
The value of positive class of target variable, default: "1". |
x_list |
Names of independent variables. |
occur_time |
The name of the variable that represents the time at which each observation takes place. |
ex_cols |
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |
xgb_params |
Parameters of xgboost.The complete list of parameters is available at: http://xgboost.readthedocs.io/en/latest/parameter.html. |
f_eval |
Custimized evaluation function,"ks" & "auc" are available. |
cv_folds |
Number of cross-validations. Default: 5. |
cp |
Threshold of XGB feature's Gain. Default is 1/number of independent variables. |
seed |
Random number seed. Default is 46. |
vars_name |
Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE. |
note |
Logical, outputs info. Default is TRUE. |
save_data |
Logical, save results results in locally specified folder. Default is FALSE. |
file_name |
The name for periodically saved results files. Default is "Feature_importance_XGB". |
dir_path |
The path for periodically saved results files. Default is "./variable". |
... |
Other parameters to pass to xgb_params. |
Value
Selected variables.
See Also
psi_iv_filter
, gbm_filter
, feature_selector
Examples
dat = UCICreditCard[1:1000,c(2,4,8:9,26)]
xgb_params = list(nrounds = 100, max_depth = 6, eta = 0.1,
min_child_weight = 1, subsample = 1,
colsample_bytree = 1, gamma = 0, scale_pos_weight = 1,
early_stopping_rounds = 10,
objective = "binary:logistic")
## Not run:
xgb_features = xgb_filter(dat_train = dat, dat_test = NULL,
target = "default.payment.next.month", occur_time = "apply_date",f_eval = 'ks',
xgb_params = xgb_params,
cv_folds = 1, ex_cols = "ID$|date$|default.payment.next.month$", vars_name = FALSE)
## End(Not run)