training_model {creditmodel} | R Documentation |

`training_model`

Model builder

```
training_model(
model_name = "mymodel",
dat,
dat_test = NULL,
target = NULL,
occur_time = NULL,
obs_id = NULL,
x_list = NULL,
ex_cols = NULL,
pos_flag = NULL,
prop = 0.7,
split_type = if (!is.null(occur_time)) "OOT" else "Random",
preproc = TRUE,
low_var = 0.99,
missing_rate = 0.98,
merge_cat = 30,
remove_dup = TRUE,
outlier_proc = TRUE,
missing_proc = "median",
default_miss = list(-1, "missing"),
miss_values = NULL,
one_hot = FALSE,
trans_log = FALSE,
feature_filter = list(filter = c("IV", "PSI", "COR", "XGB"), iv_cp = 0.02, psi_cp =
0.1, xgb_cp = 0, cv_folds = 1, hopper = FALSE),
algorithm = list("LR", "XGB", "GBM", "RF"),
LR.params = lr_params(),
XGB.params = xgb_params(),
GBM.params = gbm_params(),
RF.params = rf_params(),
breaks_list = NULL,
parallel = FALSE,
cores_num = NULL,
save_pmml = FALSE,
plot_show = FALSE,
vars_plot = TRUE,
model_path = tempdir(),
seed = 46,
...
)
```

`model_name` |
A string, name of the project. Default is "mymodel" |

`dat` |
A data.frame with independent variables and target variable. |

`dat_test` |
A data.frame of test data. Default is NULL. |

`target` |
The name of target variable. |

`occur_time` |
The name of the variable that represents the time at which each observation takes place.Default is NULL. |

`obs_id` |
The name of ID of observations or key variable of data. Default is NULL. |

`x_list` |
Names of independent variables. Default is NULL. |

`ex_cols` |
Names of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |

`pos_flag` |
The value of positive class of target variable, default: "1". |

`prop` |
Percentage of train-data after the partition. Default: 0.7. |

`split_type` |
Methods for partition. See details at : |

`preproc` |
Logical. Preprocess data. Default is TRUE. |

`low_var` |
Logical, delete low variance variables or not. Default is TRUE. |

`missing_rate` |
The maximum percent of missing values for recoding values to missing and non_missing. |

`merge_cat` |
merge categories of character variables that is more than m. |

`remove_dup` |
Logical, if TRUE, remove the duplicated observations. |

`outlier_proc` |
Logical, process outliers or not. Default is TRUE. |

`missing_proc` |
If logical, process missing values or not. If "median", then Nas imputation with k neighbors median. If "avg_dist", the distance weighted average method is applied to determine the NAs imputation with k neighbors. If "default", assigning the missing values to -1 or "missing", otherwise ,processing the missing values according to the results of missing analysis. |

`default_miss` |
Default value of missing data imputation, Defualt is list(-1,'missing'). |

`miss_values` |
Other extreme value might be used to represent missing values, e.g: -9999, -9998. These miss_values will be encoded to -1 or "missing". |

`one_hot` |
Logical. If TRUE, one-hot_encoding of category variables. Default is FASLE. |

`trans_log` |
Logical, Logarithmic transformation. Default is FALSE. |

`feature_filter` |
Parameters for selecting important and stable features.See details at: |

`algorithm` |
Algorithms for training a model. list("LR", "XGB", "GBDT", "RF") are available. |

`LR.params` |
Parameters of logistic regression & scorecard. See details at : |

`XGB.params` |
Parameters of xgboost. See details at : |

`GBM.params` |
Parameters of GBM. See details at : |

`RF.params` |
Parameters of Random Forest. See details at : |

`breaks_list` |
A table containing a list of splitting points for each independent variable. Default is NULL. |

`parallel` |
Default is FALSE. |

`cores_num` |
The number of CPU cores to use. |

`save_pmml` |
Logical, save model in PMML format. Default is TRUE. |

`plot_show` |
Logical, show model performance in current graphic device. Default is FALSE. |

`vars_plot` |
Logical, if TRUE, plot distribution ,correlation or partial dependence of model input variables . Default is TRUE. |

`model_path` |
The path for periodically saved data file. Default is |

`seed` |
Random number seed. Default is 46. |

`...` |
Other parameters. |

A list containing Model Objects.

`train_test_split`

,`data_cleansing`

, `feature_selector`

, `lr_params`

, `xgb_params`

, `gbm_params`

, `rf_params`

,`fast_high_cor_filter`

,`get_breaks_all`

,`lasso_filter`

, `woe_trans_all`

, `get_logistic_coef`

, `score_transfer`

,`get_score_card`

, `model_key_index`

,`ks_psi_plot`

,`ks_table_plot`

```
sub = cv_split(UCICreditCard, k = 30)[[1]]
dat = UCICreditCard[sub,]
x_list = c("LIMIT_BAL")
B_model = training_model(dat = dat,
model_name = "UCICreditCard",
target = "default.payment.next.month",
x_list = x_list,
occur_time =NULL,
obs_id =NULL,
dat_test = NULL,
preproc = FALSE,
outlier_proc = FALSE,
missing_proc = FALSE,
feature_filter = NULL,
algorithm = list("LR"),
LR.params = lr_params(lasso = FALSE,
step_wise = FALSE,
score_card = FALSE),
breaks_list = NULL,
parallel = FALSE,
cores_num = NULL,
save_pmml = FALSE,
plot_show = FALSE,
vars_plot = FALSE,
model_path = tempdir(),
seed = 46)
```

[Package *creditmodel* version 1.3.1 Index]