training_model {creditmodel} | R Documentation |
Training model
Description
training_model
Model builder
Usage
training_model(
model_name = "mymodel",
dat,
dat_test = NULL,
target = NULL,
occur_time = NULL,
obs_id = NULL,
x_list = NULL,
ex_cols = NULL,
pos_flag = NULL,
prop = 0.7,
split_type = if (!is.null(occur_time)) "OOT" else "Random",
preproc = TRUE,
low_var = 0.99,
missing_rate = 0.98,
merge_cat = 30,
remove_dup = TRUE,
outlier_proc = TRUE,
missing_proc = "median",
default_miss = list(-1, "missing"),
miss_values = NULL,
one_hot = FALSE,
trans_log = FALSE,
feature_filter = list(filter = c("IV", "PSI", "COR", "XGB"), iv_cp = 0.02, psi_cp =
0.1, xgb_cp = 0, cv_folds = 1, hopper = FALSE),
algorithm = list("LR", "XGB", "GBM", "RF"),
LR.params = lr_params(),
XGB.params = xgb_params(),
GBM.params = gbm_params(),
RF.params = rf_params(),
breaks_list = NULL,
parallel = FALSE,
cores_num = NULL,
save_pmml = FALSE,
plot_show = FALSE,
vars_plot = TRUE,
model_path = tempdir(),
seed = 46,
...
)
Arguments
model_name |
A string, name of the project. Default is "mymodel" |
dat |
A data.frame with independent variables and target variable. |
dat_test |
A data.frame of test data. Default is NULL. |
target |
The name of target variable. |
occur_time |
The name of the variable that represents the time at which each observation takes place.Default is NULL. |
obs_id |
The name of ID of observations or key variable of data. Default is NULL. |
x_list |
Names of independent variables. Default is NULL. |
ex_cols |
Names of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |
pos_flag |
The value of positive class of target variable, default: "1". |
prop |
Percentage of train-data after the partition. Default: 0.7. |
split_type |
Methods for partition. See details at : |
preproc |
Logical. Preprocess data. Default is TRUE. |
low_var |
Logical, delete low variance variables or not. Default is TRUE. |
missing_rate |
The maximum percent of missing values for recoding values to missing and non_missing. |
merge_cat |
merge categories of character variables that is more than m. |
remove_dup |
Logical, if TRUE, remove the duplicated observations. |
outlier_proc |
Logical, process outliers or not. Default is TRUE. |
missing_proc |
If logical, process missing values or not. If "median", then Nas imputation with k neighbors median. If "avg_dist", the distance weighted average method is applied to determine the NAs imputation with k neighbors. If "default", assigning the missing values to -1 or "missing", otherwise ,processing the missing values according to the results of missing analysis. |
default_miss |
Default value of missing data imputation, Defualt is list(-1,'missing'). |
miss_values |
Other extreme value might be used to represent missing values, e.g: -9999, -9998. These miss_values will be encoded to -1 or "missing". |
one_hot |
Logical. If TRUE, one-hot_encoding of category variables. Default is FASLE. |
trans_log |
Logical, Logarithmic transformation. Default is FALSE. |
feature_filter |
Parameters for selecting important and stable features.See details at: |
algorithm |
Algorithms for training a model. list("LR", "XGB", "GBDT", "RF") are available. |
LR.params |
Parameters of logistic regression & scorecard. See details at : |
XGB.params |
Parameters of xgboost. See details at : |
GBM.params |
Parameters of GBM. See details at : |
RF.params |
Parameters of Random Forest. See details at : |
breaks_list |
A table containing a list of splitting points for each independent variable. Default is NULL. |
parallel |
Default is FALSE. |
cores_num |
The number of CPU cores to use. |
save_pmml |
Logical, save model in PMML format. Default is TRUE. |
plot_show |
Logical, show model performance in current graphic device. Default is FALSE. |
vars_plot |
Logical, if TRUE, plot distribution ,correlation or partial dependence of model input variables . Default is TRUE. |
model_path |
The path for periodically saved data file. Default is |
seed |
Random number seed. Default is 46. |
... |
Other parameters. |
Value
A list containing Model Objects.
See Also
train_test_split
,data_cleansing
, feature_selector
, lr_params
, xgb_params
, gbm_params
, rf_params
,fast_high_cor_filter
,get_breaks_all
,lasso_filter
, woe_trans_all
, get_logistic_coef
, score_transfer
,get_score_card
, model_key_index
,ks_psi_plot
,ks_table_plot
Examples
sub = cv_split(UCICreditCard, k = 30)[[1]]
dat = UCICreditCard[sub,]
x_list = c("LIMIT_BAL")
B_model = training_model(dat = dat,
model_name = "UCICreditCard",
target = "default.payment.next.month",
x_list = x_list,
occur_time =NULL,
obs_id =NULL,
dat_test = NULL,
preproc = FALSE,
outlier_proc = FALSE,
missing_proc = FALSE,
feature_filter = NULL,
algorithm = list("LR"),
LR.params = lr_params(lasso = FALSE,
step_wise = FALSE,
score_card = FALSE),
breaks_list = NULL,
parallel = FALSE,
cores_num = NULL,
save_pmml = FALSE,
plot_show = FALSE,
vars_plot = FALSE,
model_path = tempdir(),
seed = 46)