impute_supervised {imputeGeneric} | R Documentation |
Supervised imputation
Description
Impute a data set with a supervised inner method. This function is one main
function which can be used inside of impute_iterative()
. If you need
pre-imputation or iterations, directly use impute_iterative()
.
Usage
impute_supervised(
ds,
model_spec_parsnip = linear_reg(),
cols_used_for_imputation = "only_complete",
cols_order = seq_len(ncol(ds)),
rows_used_for_imputation = "only_complete",
rows_order = seq_len(nrow(ds)),
update_model = "each_column",
update_ds_model = "each_column",
M = is.na(ds),
warn_incomplete_imputation = TRUE,
...
)
Arguments
ds |
The data set to be imputed. Must be a data frame with column names. |
model_spec_parsnip |
The model type used for imputation. It is defined
via the |
cols_used_for_imputation |
Which columns should be used to impute other columns? Possible choices: "only_complete", "already_imputed", "all" |
cols_order |
Ordering of the columns for imputation. This can be a
vector with indices or an |
rows_used_for_imputation |
Which rows should be used to impute other rows? Possible choices: "only_complete", "partly_complete", "complete_in_k", "already_imputed", "all_except_i", "all" |
rows_order |
Ordering of the rows for imputation. This can be a vector
with indices or an |
update_model |
How often should the model for imputation be updated? Possible choices are: "everytime" (after every imputed value), "each_column" (only one update per column) and "every_iteration" (an alias for "each_column"). |
update_ds_model |
How often should the data set for the inner model be updated? Possible choices are: "everytime" (after every imputed value), "each_column" (only one update per column) and "every_iteration". |
M |
Missing data indicator matrix |
warn_incomplete_imputation |
Should a warning be given, if the
returned data set still contains |
... |
Arguments passed on to |
Details
This function imputes the columns of the data set ds
column by column. The
imputation order of the columns can be specified by cols_order
.
Furthermore, cols_used_for_imputation
controls which columns are used for
the imputation. The same options are available for the rows of ds
via
rows_order
and rows_used_for_imputation
. If ds
is pre-imputed, the
missing data indicator matrix can be supplied via M
.
The inner method can be specified via model_spec_parsnip
which should be a
parsnip model type like parsnip::linear_reg()
, parsnip::rand_forest()
(for a complete list see https://www.tidymodels.org/find/parsnip, you can
also build a new parsnip model and use it inside of impute_supervised()
,
see https://www.tidymodels.org/learn/develop/models for more information
on building a parsnip model).
The options "all" for cols_used_for_imputation
and
"all_except_i", "all" for rows_used_for_imputation
should only be used,
if ds
is complete or the model (model_spec_parsnip
) can handle missing
data.
The choice update_model = "each_column"
can be much faster than
update_model = "everytime"
, especially, if the data set has many
missing values in some columns.
Value
The imputed data set.
Examples
ds_mis <- missMethods::delete_MCAR(
data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1
)
impute_supervised(ds_mis)