process_nas {creditmodel} | R Documentation |
missing Treatment
Description
process_nas_var
is for missing value analysis and treatment using knn imputation, central impulation and random imputation.
process_nas
is a simpler wrapper for process_nas_var
.
Usage
process_nas(
dat,
x_list = NULL,
class_var = FALSE,
miss_values = list(-1, "missing"),
default_miss = list(-1, "missing"),
parallel = FALSE,
ex_cols = NULL,
method = "median",
note = FALSE,
save_data = FALSE,
file_name = NULL,
dir_path = tempdir(),
...
)
process_nas_var(
dat = dat,
x,
missing_type = NULL,
method = "median",
nas_rate = NULL,
default_miss = list("missing", -1),
mat_nas_shadow = NULL,
dt_nas_random = NULL,
note = FALSE,
save_data = FALSE,
file_name = NULL,
dir_path = tempdir(),
...
)
Arguments
dat |
A data.frame with independent variables. |
x_list |
Names of independent variables. |
class_var |
Logical, nas analysis of the nominal variables. Default is TRUE. |
miss_values |
Other extreme value might be used to represent missing values, e.g:-1, -9999, -9998. These miss_values will be encoded to NA. |
default_miss |
Default value of missing data imputation, Defualt is list(-1,'missing'). |
parallel |
Logical, parallel computing. Default is FALSE. |
ex_cols |
A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL. |
method |
The methods of imputation by knn. If "median", then Nas imputation with k neighbors median. If "avg_dist", the distance weighted average method is applied to determine the NAs imputation with k neighbors. If "default", assigning the missing values to -1 or "missing", otherwise ,processing the missing values according to the results of missing analysis. |
note |
Logical, outputs info. Default is TRUE. |
save_data |
Logical. If TRUE, save missing analysis to |
file_name |
The file name for periodically saved missing analysis file. Default is NULL. |
dir_path |
The path for periodically saved missing analysis file. Default is "./variable". |
... |
Other parameters. |
x |
The name of variable to process. |
missing_type |
Type of missing, genereted by codeanalysis_nas |
nas_rate |
A list contains nas rate of each variable. |
mat_nas_shadow |
A shadow matrix of variables which contain nas. |
dt_nas_random |
A data.frame with random nas imputation. |
Value
A dat frame with no NAs.
Examples
dat_na = process_nas(dat = UCICreditCard[1:1000,],
parallel = FALSE,ex_cols = "ID$", method = "median")