R: Unsupervised imputation

impute_unsupervised {imputeGeneric}

R Documentation

Unsupervised imputation

Description

Impute a data set with an unsupervised inner method. This function is one main function which can be used inside of impute_iterative(). If you need pre-imputation or iterations, directly use impute_iterative().

Usage

impute_unsupervised(
  ds,
  model_fun,
  predict_fun,
  rows_used_for_imputation = "only_complete",
  rows_order = seq_len(nrow(ds)),
  update_model = "every_iteration",
  update_ds_model = "every_iteration",
  model_arg = NULL,
  M = is.na(ds),
  ...
)

Arguments

`ds`	The data set to be imputed. Must be a data frame with column names.
`model_fun`	An unsupervised model function which take as arguments `ds_used` (the data set used to build the model, specified via `rows_used_for_imputation`), `M` and `i` (the index of the row currently under imputation).
`predict_fun`	A predict function which uses the via `model_fun` generated model (`model_imp`) to predict the missing values of a row. It should take the arguments `model_imp`, `ds_used`, `M` and `i`.
`rows_used_for_imputation`	Which rows should be used to impute other rows? Possible choices: "only_complete", "already_imputed", "all_except_i", "all"
`rows_order`	Ordering of the rows for imputation. This can be a vector with indices or an `order_option` from `order_rows()`.
`update_model`	How often should the model for imputation be updated? Possible choices are: "everytime" (after every imputed value) and "every_iteration" (only one model is created and used for all missing values).
`update_ds_model`	How often should the data set for the inner model be updated? Possible choices are: "everytime" (after every imputed value), and "every_iteration".
`model_arg`	Further arguments for `model_fun`. This can be a list, if it is more than one argument.
`M`	Missing data indicator matrix
`...`	Further arguments given to `predict_fun`.

Details

This function imputes the rows of the data set ds row by row. The imputation order of the rows can be specified by rows_order. Furthermore, rows_used_for_imputation controls which rows are used for the imputation. If ds is pre-imputed, the missing data indicator matrix can be supplied via M.

The inner method used to impute the data set can be defined with model_fun. This model_fun must take a data set, the missing data indicator matrix M, the index i of the row which should be imputed right now (which is NULL, if the model is updated only once per iteration or only uses complete rows) and model_arg in this order. It must return a model model_imp which is given to predict_fun to generate imputation values for the missing values in a row i. The model_fun and predict_fun can be self-written or a predefined one (see below) can be used.

If update_model = "every_iteration" only one model is fitted and the argument update_ds_model is ignored. This option can be considerably faster than update_model = "everytime", especially, for data sets with many rows with missing values. However, some methods (like nearest neighbors) need update_model = "everytime".

Value

The imputed data set.

Examples

ds_mis <- missMethods::delete_MCAR(
  data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1
)
impute_unsupervised(ds_mis, model_donor, predict_donor)
# knn imputation with k = 2
impute_unsupervised(ds_mis, model_donor, predict_donor,
  update_model = "everytime", model_arg = list(k = 2)
)

[Package imputeGeneric version 0.1.0 Index]