impute_iterative {imputeGeneric} | R Documentation |
Iterative imputation
Description
Iterative imputation of a data set
Usage
impute_iterative(
ds,
model_spec_parsnip = linear_reg(),
model_fun_unsupervised = NULL,
predict_fun_unsupervised = NULL,
max_iter = 10,
stop_fun = NULL,
initial_imputation_fun = NULL,
cols_used_for_imputation = "only_complete",
cols_order = seq_len(ncol(ds)),
rows_used_for_imputation = "only_complete",
rows_order = seq_len(nrow(ds)),
update_model = "every_iteration",
update_ds_model = "every_iteration",
stop_fun_args = NULL,
M = is.na(ds),
model_arg = NULL,
warn_incomplete_imputation = TRUE,
...
)
Arguments
ds |
The data set to be imputed. Must be a data frame with column names. |
model_spec_parsnip |
The model type used for supervised imputation (see
( |
model_fun_unsupervised |
An unsupervised model function (see
|
predict_fun_unsupervised |
A predict function for unsupervised
imputation (see |
max_iter |
Maximum number of iterations |
stop_fun |
A stopping function (see details below) or |
initial_imputation_fun |
This function will do the initial imputation of
the missing values. If |
cols_used_for_imputation |
Which columns should be used to impute other columns? Possible choices: "only_complete", "already_imputed", "all" |
cols_order |
Ordering of the columns for imputation. This can be a
vector with indices or an |
rows_used_for_imputation |
Which rows should be used to impute other rows? Possible choices: "only_complete", "partly_complete", "complete_in_k", "already_imputed", "all_except_i", "all" |
rows_order |
Ordering of the rows for imputation. This can be a vector
with indices or an |
update_model |
How often should the model for imputation be updated? |
update_ds_model |
How often should the data set for the inner model be updated? |
stop_fun_args |
Further arguments passed on to |
M |
Missing data indicator matrix |
model_arg |
Further arguments for |
warn_incomplete_imputation |
Should a warning be given, if the
returned data set still contains |
... |
Further arguments passed on to |
Details
This function impute a data set in an iterative way. Internally, either
impute_supervised()
or impute_unsupervised()
is used, depending on the
values of model_spec_parsnip
, model_fun_unsupervised
and
predict_fun_unsupervised
. If you want to use a supervised inner method,
model_spec_parsnip
must be specified and model_fun_unsupervised
and
predict_fun_unsupervised
must both be NULL
. For an unsupervised inner
method, model_fun_unsupervised
and predict_fun_unsupervised
must be
specified and model_spec_parsnip
must be NULL
. Some arguments of this
function are only meaningful for impute_supervised()
or
impute_unsupervised()
.
Value
an imputed data set (or a return value of stop_fun
)
stop_fun
The stop_fun
should take the arguments
-
ds
(the data set imputed in the current iteration) -
ds_old
(the data set imputed in the last iteration) a list (with named elements
M
,nr_iterations
,max_iter
)-
stop_fun_args
-
res_stop_fun
(the return value ofstop_fun
from the last iteration. Initial value for the first iteration:list(stop_iter = FALSE)
) in this order.
To allow for a next iteration, the stop_fun
must return a list which
contains the named element stop_iter = FALSE
. The simple return
list(stop_iter = FALSE)
will allow the iteration to continue. However,
the list can include more information which are handed over to stop_fun
in the next iteration. For example, the return value
list(stop_iter = FALSE, last_eps = 0.3)
would also lead to another
iteration. If stop_fun
does not return a list or the list does not
contain stop_iter = FALSE
the iteration is stopped and the return value
of stop_fun
is returned as result of impute_iterative()
. Therefore,
this return value should normally include the imputed data set ds
or
ds_old
.
An example for a stop_fun
is stop_ds_difference()
.
See Also
-
impute_supervised()
andimpute_unsupervised()
as the workhorses for the imputation. -
stop_ds_difference()
as an example of a stop function.
Examples
set.seed(123)
# simple example
ds_mis <- missMethods::delete_MCAR(
data.frame(X = rnorm(20), Y = rnorm(20)), 0.2, 1
)
impute_iterative(ds_mis, max_iter = 2)
# using pre-imputation
ds_mis <- missMethods::delete_MCAR(
data.frame(X = rnorm(20), Y = rnorm(20)), 0.2
)
impute_iterative(
ds_mis,
max_iter = 2, initial_imputation_fun = missMethods::impute_mean
)
# example using stop_ds_difference() as stop_fun
ds_mis <- missMethods::delete_MCAR(
data.frame(X = rnorm(20), Y = rnorm(20)), 0.2
)
ds_imp <- impute_iterative(
ds_mis,
initial_imputation_fun = missMethods::impute_mean,
stop_fun = stop_ds_difference, stop_fun_args = list(eps = 0.5)
)
attr(ds_imp, "nr_iterations")