training_model {creditmodel} | R Documentation |

`training_model`

Model builder

training_model( model_name = "mymodel", dat, dat_test = NULL, target = NULL, occur_time = NULL, obs_id = NULL, x_list = NULL, ex_cols = NULL, pos_flag = NULL, prop = 0.7, split_type = if (!is.null(occur_time)) "OOT" else "Random", preproc = TRUE, low_var = 0.99, missing_rate = 0.98, merge_cat = 30, remove_dup = TRUE, outlier_proc = TRUE, missing_proc = "median", default_miss = list(-1, "missing"), miss_values = NULL, one_hot = FALSE, trans_log = FALSE, feature_filter = list(filter = c("IV", "PSI", "COR", "XGB"), iv_cp = 0.02, psi_cp = 0.1, xgb_cp = 0, cv_folds = 1, hopper = FALSE), algorithm = list("LR", "XGB", "GBM", "RF"), LR.params = lr_params(), XGB.params = xgb_params(), GBM.params = gbm_params(), RF.params = rf_params(), breaks_list = NULL, parallel = FALSE, cores_num = NULL, save_pmml = FALSE, plot_show = FALSE, vars_plot = TRUE, model_path = tempdir(), seed = 46, ... )

`model_name` |
A string, name of the project. Default is "mymodel" |

`dat` |
A data.frame with independent variables and target variable. |

`dat_test` |
A data.frame of test data. Default is NULL. |

`target` |
The name of target variable. |

`occur_time` |
The name of the variable that represents the time at which each observation takes place.Default is NULL. |

`obs_id` |
The name of ID of observations or key variable of data. Default is NULL. |

`x_list` |
Names of independent variables. Default is NULL. |

`ex_cols` |
Names of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |

`pos_flag` |
The value of positive class of target variable, default: "1". |

`prop` |
Percentage of train-data after the partition. Default: 0.7. |

`split_type` |
Methods for partition. See details at : |

`preproc` |
Logical. Preprocess data. Default is TRUE. |

`low_var` |
Logical, delete low variance variables or not. Default is TRUE. |

`missing_rate` |
The maximum percent of missing values for recoding values to missing and non_missing. |

`merge_cat` |
merge categories of character variables that is more than m. |

`remove_dup` |
Logical, if TRUE, remove the duplicated observations. |

`outlier_proc` |
Logical, process outliers or not. Default is TRUE. |

`missing_proc` |
If logical, process missing values or not. If "median", then Nas imputation with k neighbors median. If "avg_dist", the distance weighted average method is applied to determine the NAs imputation with k neighbors. If "default", assigning the missing values to -1 or "missing", otherwise ,processing the missing values according to the results of missing analysis. |

`default_miss` |
Default value of missing data imputation, Defualt is list(-1,'missing'). |

`miss_values` |
Other extreme value might be used to represent missing values, e.g: -9999, -9998. These miss_values will be encoded to -1 or "missing". |

`one_hot` |
Logical. If TRUE, one-hot_encoding of category variables. Default is FASLE. |

`trans_log` |
Logical, Logarithmic transformation. Default is FALSE. |

`feature_filter` |
Parameters for selecting important and stable features.See details at: |

`algorithm` |
Algorithms for training a model. list("LR", "XGB", "GBDT", "RF") are available. |

`LR.params` |
Parameters of logistic regression & scorecard. See details at : |

`XGB.params` |
Parameters of xgboost. See details at : |

`GBM.params` |
Parameters of GBM. See details at : |

`RF.params` |
Parameters of Random Forest. See details at : |

`breaks_list` |
A table containing a list of splitting points for each independent variable. Default is NULL. |

`parallel` |
Default is FALSE. |

`cores_num` |
The number of CPU cores to use. |

`save_pmml` |
Logical, save model in PMML format. Default is TRUE. |

`plot_show` |
Logical, show model performance in current graphic device. Default is FALSE. |

`vars_plot` |
Logical, if TRUE, plot distribution ,correlation or partial dependence of model input variables . Default is TRUE. |

`model_path` |
The path for periodically saved data file. Default is |

`seed` |
Random number seed. Default is 46. |

`...` |
Other parameters. |

A list containing Model Objects.

`train_test_split`

,`data_cleansing`

, `feature_selector`

, `lr_params`

, `xgb_params`

, `gbm_params`

, `rf_params`

,`fast_high_cor_filter`

,`get_breaks_all`

,`lasso_filter`

, `woe_trans_all`

, `get_logistic_coef`

, `score_transfer`

,`get_score_card`

, `model_key_index`

,`ks_psi_plot`

,`get_plots`

,`ks_table_plot`

sub = cv_split(UCICreditCard, k = 30)[[1]] dat = UCICreditCard[sub,] x_list = c("LIMIT_BAL") B_model = training_model(dat = dat, model_name = "UCICreditCard", target = "default.payment.next.month", x_list = x_list, occur_time =NULL, obs_id =NULL, dat_test = NULL, preproc = FALSE, outlier_proc = FALSE, missing_proc = FALSE, feature_filter = NULL, algorithm = list("LR"), LR.params = lr_params(lasso = FALSE, step_wise = FALSE, score_card = FALSE), breaks_list = NULL, parallel = FALSE, cores_num = NULL, save_pmml = FALSE, plot_show = FALSE, vars_plot = FALSE, model_path = tempdir(), seed = 46)

[Package *creditmodel* version 1.3.0 Index]