complete {rMIDAS} | R Documentation |
Impute missing values using imputation model
Description
Having trained an imputation model, complete() produces m
completed datasets, saved as a list.
Usage
complete(
mid_obj,
m = 10L,
unscale = TRUE,
bin_label = TRUE,
cat_coalesce = TRUE,
fast = FALSE,
file = NULL,
file_root = NULL
)
Arguments
mid_obj |
Object of class |
m |
An integer, the number of completed datasets required |
unscale |
Boolean, indicating whether to unscale any columns that were previously minmax scaled between 0 and 1 |
bin_label |
Boolean, indicating whether to add back labels for binary columns |
cat_coalesce |
Boolean, indicating whether to decode the one-hot encoded categorical variables |
fast |
Boolean, indicating whether to impute category with highest predicted probability (TRUE), or to use predicted probabilities to make weighted sample of category levels (FALSE) |
file |
Path to save completed datasets. If |
file_root |
A character string, used as the root for all filenames when saving completed datasets if a |
Details
For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.
Value
List of length m
, each element of which is a completed data.frame (i.e. no missing values)
References
Lall R, Robinson T (2023). “Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS.” Journal of Statistical Software, 107(9), 1–38. doi:10.18637/jss.v107.i09.
Examples
# Generate raw data, with numeric, binary, and categorical variables
## Not run:
# Run where Python available and configured correctly
if (python_configured()) {
set.seed(89)
n_obs <- 10000
raw_data <- data.table(a = sample(c("red","yellow","blue",NA),n_obs, replace = TRUE),
b = 1:n_obs,
c = sample(c("YES","NO",NA),n_obs,replace=TRUE),
d = runif(n_obs,1,10),
e = sample(c("YES","NO"), n_obs, replace = TRUE),
f = sample(c("male","female","trans","other",NA), n_obs, replace = TRUE))
# Names of bin./cat. variables
test_bin <- c("c","e")
test_cat <- c("a","f")
# Pre-process data
test_data <- convert(raw_data,
bin_cols = test_bin,
cat_cols = test_cat,
minmax_scale = TRUE)
# Run imputations
test_imp <- train(test_data)
# Generate datasets
complete_datasets <- complete(test_imp, m = 5, fast = FALSE)
# Use Rubin's rules to combine m regression models
midas_pool <- combine(formula = d~a+c+e+f,
complete_datasets)
}
## End(Not run)