convert {rMIDAS} | R Documentation |
Pre-process data for Midas imputation
Description
convert
pre-processes datasets to enable user-friendly interface with the main train()
function.
Usage
convert(data, bin_cols = NULL, cat_cols = NULL, minmax_scale = FALSE)
Arguments
data |
Either an object of class |
bin_cols , cat_cols |
A vector, column names corresponding to binary and categorical variables respectively |
minmax_scale |
Boolean, indicating whether to scale all numeric columns between 0 and 1, to improve model convergence |
Details
The function has two advantages over manual pre-processing:
Utilises data.table for fast read-in and processing of large datasets
Outputs an object that can be passed directly to
train()
without re-specifying column names etc.
For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.
Value
Returns custom S3 object of class ‘midas_preproc’ containing:
-
data
– processed version of input data, -
bin_list
– vector of binary variable names -
cat_lists
– embedded list of one-hot encoded categorical variable names -
minmax_params
– list of min. and max. values for each numeric object scaled
List containing converted data, categorical and binary labels to be imported into the imputation model, and scaling parameters for post-imputation transformations.
References
Lall R, Robinson T (2023). “Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS.” Journal of Statistical Software, 107(9), 1–38. doi:10.18637/jss.v107.i09.
Examples
data = data.frame(a = sample(c("red","yellow","blue",NA),100, replace = TRUE),
b = 1:100,
c = sample(c("YES","NO",NA),100,replace = TRUE),
d = runif(100),
e = sample(c("YES","NO"), 100, replace = TRUE),
f = sample(c("male","female","trans","other",NA), 100, replace = TRUE),
stringsAsFactors = FALSE)
bin <- c("c","e")
cat <- c("a","f")
convert(data, bin_cols = bin, cat_cols = cat)