mixgb {mixgb}R Documentation

Multiple imputation through XGBoost

Description

This function is used to generate multiply imputed datasets using XGBoost, subsampling and predictive mean matching (PMM).

Usage

mixgb(
  data,
  m = 5,
  maxit = 1,
  ordinalAsInteger = FALSE,
  bootstrap = FALSE,
  pmm.type = "auto",
  pmm.k = 5,
  pmm.link = "prob",
  initial.num = "normal",
  initial.int = "mode",
  initial.fac = "mode",
  save.models = FALSE,
  save.vars = NULL,
  verbose = F,
  xgb.params = list(max_depth = 3, gamma = 0, eta = 0.3, min_child_weight = 1,
    subsample = 0.7, colsample_bytree = 1, colsample_bylevel = 1, colsample_bynode = 1,
    tree_method = "auto", gpu_id = 0, predictor = "auto"),
  nrounds = 100,
  early_stopping_rounds = 10,
  print_every_n = 10L,
  xgboost_verbose = 0,
  ...
)

Arguments

data

A data.frame or data.table with missing values

m

The number of imputed datasets. Default: 5

maxit

The number of imputation iterations. Default: 1

ordinalAsInteger

Whether to convert ordinal factors to integers. By default, ordinalAsInteger = FALSE. Setting ordinalAsInteger = TRUE may speed up the imputation process for large datasets.

bootstrap

Whether to use bootstrapping for multiple imputation. By default, bootstrap = FALSE. Setting bootstrap = TRUE can improve imputation variability if sampling-related hyperparameters of XGBoost are set to 1.

pmm.type

The type of predictive mean matching (PMM). Possible values:

  • NULL: Imputations without PMM;

  • 0: Imputations with PMM type 0;

  • 1: Imputations with PMM type 1;

  • 2: Imputations with PMM type 2;

  • "auto" (Default): Imputations with PMM type 2 for numeric/integer variables; imputations without PMM for categorical variables.

pmm.k

The number of donors for predictive mean matching. Default: 5

pmm.link

The link for predictive mean matching in binary variables

  • "prob" (Default): use probabilities;

  • "logit": use logit values.

initial.num

Initial imputation method for numeric type data:

  • "normal" (Default);

  • "mean";

  • "median";

  • "mode";

  • "sample".

initial.int

Initial imputation method for integer type data:

  • "mode" (Default);

  • "sample".

initial.fac

Initial imputation method for factor type data:

  • "mode" (Default);

  • "sample".

save.models

Whether to save imputation models for imputing new data later on. Default: FALSE

save.vars

For the purpose of imputing new data, the imputation models for response variables specified in save.vars will be saved. The values in save.vars can be a vector of names or indices. By default, only the imputation models for variables with missing values in the original data will be saved (save.vars = NULL). To save imputation models for all variables, users can specify it with save.vars = colnames(data).

verbose

Verbose setting for mixgb. If TRUE, will print out the progress of imputation. Default: FALSE.

xgb.params

A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.

nrounds

The maximum number of boosting iterations for XGBoost. Default: 100

early_stopping_rounds

An integer value k. XGBoost training will stop if the validation performance has not improved for k rounds. Default: 10.

print_every_n

Print XGBoost evaluation information at every nth iteration if xgboost_verbose > 0.

xgboost_verbose

Verbose setting for XGBoost training: 0 (silent), 1 (print information) and 2 (print additional information). Default: 0

...

Extra arguments to be passed to XGBoost

Value

If save.models = FALSE, this function will return a list of m imputed datasets. If save.models = TRUE, it will return an object with imputed datasets, saved models and parameters.

Examples

# obtain m multiply datasets without saving models
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
mixgb.data <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10)

# obtain m multiply imputed datasets and save models for imputing new data later on
mixgb.obj <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10, save.models = TRUE)

[Package mixgb version 1.0.2 Index]