impute_supervised {imputeGeneric}R Documentation

Supervised imputation

Description

Impute a data set with a supervised inner method. This function is one main function which can be used inside of impute_iterative(). If you need pre-imputation or iterations, directly use impute_iterative().

Usage

impute_supervised(
  ds,
  model_spec_parsnip = linear_reg(),
  cols_used_for_imputation = "only_complete",
  cols_order = seq_len(ncol(ds)),
  rows_used_for_imputation = "only_complete",
  rows_order = seq_len(nrow(ds)),
  update_model = "each_column",
  update_ds_model = "each_column",
  M = is.na(ds),
  warn_incomplete_imputation = TRUE,
  ...
)

Arguments

ds

The data set to be imputed. Must be a data frame with column names.

model_spec_parsnip

The model type used for imputation. It is defined via the parsnip package.

cols_used_for_imputation

Which columns should be used to impute other columns? Possible choices: "only_complete", "already_imputed", "all"

cols_order

Ordering of the columns for imputation. This can be a vector with indices or an order_option from order_cols().

rows_used_for_imputation

Which rows should be used to impute other rows? Possible choices: "only_complete", "partly_complete", "complete_in_k", "already_imputed", "all_except_i", "all"

rows_order

Ordering of the rows for imputation. This can be a vector with indices or an order_option from order_rows().

update_model

How often should the model for imputation be updated? Possible choices are: "everytime" (after every imputed value), "each_column" (only one update per column) and "every_iteration" (an alias for "each_column").

update_ds_model

How often should the data set for the inner model be updated? Possible choices are: "everytime" (after every imputed value), "each_column" (only one update per column) and "every_iteration".

M

Missing data indicator matrix

warn_incomplete_imputation

Should a warning be given, if the returned data set still contains NA?

...

Arguments passed on to stats::predict().

Details

This function imputes the columns of the data set ds column by column. The imputation order of the columns can be specified by cols_order. Furthermore, cols_used_for_imputation controls which columns are used for the imputation. The same options are available for the rows of ds via rows_order and rows_used_for_imputation. If ds is pre-imputed, the missing data indicator matrix can be supplied via M.

The inner method can be specified via model_spec_parsnip which should be a parsnip model type like parsnip::linear_reg(), parsnip::rand_forest() (for a complete list see https://www.tidymodels.org/find/parsnip, you can also build a new parsnip model and use it inside of impute_supervised(), see https://www.tidymodels.org/learn/develop/models for more information on building a parsnip model).

The options "all" for cols_used_for_imputation and "all_except_i", "all" for rows_used_for_imputation should only be used, if ds is complete or the model (model_spec_parsnip) can handle missing data.

The choice update_model = "each_column" can be much faster than update_model = "everytime", especially, if the data set has many missing values in some columns.

Value

The imputed data set.

Examples

ds_mis <- missMethods::delete_MCAR(
  data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1
)
impute_supervised(ds_mis)

[Package imputeGeneric version 0.1.0 Index]