process_nas {creditmodel}R Documentation

missing Treatment

Description

process_nas_var is for missing value analysis and treatment using knn imputation, central impulation and random imputation. process_nas is a simpler wrapper for process_nas_var.

Usage

process_nas(
  dat,
  x_list = NULL,
  class_var = FALSE,
  miss_values = list(-1, "missing"),
  default_miss = list(-1, "missing"),
  parallel = FALSE,
  ex_cols = NULL,
  method = "median",
  note = FALSE,
  save_data = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)

process_nas_var(
  dat = dat,
  x,
  missing_type = NULL,
  method = "median",
  nas_rate = NULL,
  default_miss = list("missing", -1),
  mat_nas_shadow = NULL,
  dt_nas_random = NULL,
  note = FALSE,
  save_data = FALSE,
  file_name = NULL,
  dir_path = tempdir(),
  ...
)

Arguments

dat

A data.frame with independent variables.

x_list

Names of independent variables.

class_var

Logical, nas analysis of the nominal variables. Default is TRUE.

miss_values

Other extreme value might be used to represent missing values, e.g:-1, -9999, -9998. These miss_values will be encoded to NA.

default_miss

Default value of missing data imputation, Defualt is list(-1,'missing').

parallel

Logical, parallel computing. Default is FALSE.

ex_cols

A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

method

The methods of imputation by knn. If "median", then Nas imputation with k neighbors median. If "avg_dist", the distance weighted average method is applied to determine the NAs imputation with k neighbors. If "default", assigning the missing values to -1 or "missing", otherwise ,processing the missing values according to the results of missing analysis.

note

Logical, outputs info. Default is TRUE.

save_data

Logical. If TRUE, save missing analysis to dir_path

file_name

The file name for periodically saved missing analysis file. Default is NULL.

dir_path

The path for periodically saved missing analysis file. Default is "./variable".

...

Other parameters.

x

The name of variable to process.

missing_type

Type of missing, genereted by codeanalysis_nas

nas_rate

A list contains nas rate of each variable.

mat_nas_shadow

A shadow matrix of variables which contain nas.

dt_nas_random

A data.frame with random nas imputation.

Value

A dat frame with no NAs.

Examples

dat_na = process_nas(dat = UCICreditCard[1:1000,],
parallel = FALSE,ex_cols = "ID$", method = "median")


[Package creditmodel version 1.3.0 Index]