convert {rMIDAS}R Documentation

Pre-process data for Midas imputation

Description

convert pre-processes datasets to enable user-friendly interface with the main train() function.

Usage

convert(data, bin_cols = NULL, cat_cols = NULL, minmax_scale = FALSE)

Arguments

data

Either an object of class data.frame, data.table, or a path to a regular, delimited file

bin_cols, cat_cols

A vector, column names corresponding to binary and categorical variables respectively

minmax_scale

Boolean, indicating whether to scale all numeric columns between 0 and 1, to improve model convergence

Details

The function has two advantages over manual pre-processing:

  1. Utilises data.table for fast read-in and processing of large datasets

  2. Outputs an object that can be passed directly to train() without re-specifying column names etc.

For more information, see Lall and Robinson (2023): doi:10.18637/jss.v107.i09.

Value

Returns custom S3 object of class ‘midas_preproc’ containing:

List containing converted data, categorical and binary labels to be imported into the imputation model, and scaling parameters for post-imputation transformations.

References

Lall R, Robinson T (2023). “Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS.” Journal of Statistical Software, 107(9), 1–38. doi:10.18637/jss.v107.i09.

Examples

data = data.frame(a = sample(c("red","yellow","blue",NA),100, replace = TRUE),
                  b = 1:100,
                  c = sample(c("YES","NO",NA),100,replace = TRUE),
                  d = runif(100),
                  e = sample(c("YES","NO"), 100, replace = TRUE),
                  f = sample(c("male","female","trans","other",NA), 100, replace = TRUE),
                  stringsAsFactors = FALSE)

bin <- c("c","e")
cat <- c("a","f")

convert(data, bin_cols = bin, cat_cols = cat)

[Package rMIDAS version 1.0.0 Index]