complete {rMIDAS}R Documentation

Impute missing values using imputation model

Description

Having trained an imputation model, complete() produces m completed datasets, saved as a list.

Usage

complete(
  mid_obj,
  m = 10L,
  unscale = TRUE,
  bin_label = TRUE,
  cat_coalesce = TRUE,
  fast = FALSE,
  file = NULL,
  file_root = NULL
)

Arguments

mid_obj

Object of class midas, the result of running rMIDAS::train()

m

An integer, the number of completed datasets required

unscale

Boolean, indicating whether to unscale any columns that were previously minmax scaled between 0 and 1

bin_label

Boolean, indicating whether to add back labels for binary columns

cat_coalesce

Boolean, indicating whether to decode the one-hot encoded categorical variables

fast

Boolean, indicating whether to impute category with highest predicted probability (TRUE), or to use predicted probabilities to make weighted sample of category levels (FALSE)

file

Path to save completed datasets. If NULL, completed datasets are only loaded into memory.

file_root

A character string, used as the root for all filenames when saving completed datasets if a filepath is supplied. If no file_root is provided, completed datasets will be saved as "file/midas_impute_yymmdd_hhmmss_m.csv"

Details

For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.

Value

List of length m, each element of which is a completed data.frame (i.e. no missing values)

References

Lall R, Robinson T (2023). “Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS.” Journal of Statistical Software, 107(9), 1–38. doi:10.18637/jss.v107.i09.

Examples

# Generate raw data, with numeric, binary, and categorical variables
## Not run: 
# Run where Python available and configured correctly
if (python_configured()) {
set.seed(89)
n_obs <- 10000
raw_data <- data.table(a = sample(c("red","yellow","blue",NA),n_obs, replace = TRUE),
                       b = 1:n_obs,
                       c = sample(c("YES","NO",NA),n_obs,replace=TRUE),
                       d = runif(n_obs,1,10),
                       e = sample(c("YES","NO"), n_obs, replace = TRUE),
                       f = sample(c("male","female","trans","other",NA), n_obs, replace = TRUE))

# Names of bin./cat. variables
test_bin <- c("c","e")
test_cat <- c("a","f")

# Pre-process data
test_data <- convert(raw_data,
                     bin_cols = test_bin,
                     cat_cols = test_cat,
                     minmax_scale = TRUE)

# Run imputations
test_imp <- train(test_data)

# Generate datasets
complete_datasets <- complete(test_imp, m = 5, fast = FALSE)

# Use Rubin's rules to combine m regression models
midas_pool <- combine(formula = d~a+c+e+f,
                      complete_datasets)
}

## End(Not run)


[Package rMIDAS version 1.0.0 Index]