mixgb {mixgb} | R Documentation |
Multiple imputation through XGBoost
Description
This function is used to generate multiply imputed datasets using XGBoost, subsampling and predictive mean matching (PMM).
Usage
mixgb(
data,
m = 5,
maxit = 1,
ordinalAsInteger = FALSE,
bootstrap = FALSE,
pmm.type = "auto",
pmm.k = 5,
pmm.link = "prob",
initial.num = "normal",
initial.int = "mode",
initial.fac = "mode",
save.models = FALSE,
save.vars = NULL,
verbose = F,
xgb.params = list(max_depth = 3, gamma = 0, eta = 0.3, min_child_weight = 1,
subsample = 0.7, colsample_bytree = 1, colsample_bylevel = 1, colsample_bynode = 1,
tree_method = "auto", gpu_id = 0, predictor = "auto"),
nrounds = 100,
early_stopping_rounds = 10,
print_every_n = 10L,
xgboost_verbose = 0,
...
)
Arguments
data |
A data.frame or data.table with missing values |
m |
The number of imputed datasets. Default: 5 |
maxit |
The number of imputation iterations. Default: 1 |
ordinalAsInteger |
Whether to convert ordinal factors to integers. By default, |
bootstrap |
Whether to use bootstrapping for multiple imputation. By default, |
pmm.type |
The type of predictive mean matching (PMM). Possible values:
|
pmm.k |
The number of donors for predictive mean matching. Default: 5 |
pmm.link |
The link for predictive mean matching in binary variables
|
initial.num |
Initial imputation method for numeric type data:
|
initial.int |
Initial imputation method for integer type data:
|
initial.fac |
Initial imputation method for factor type data:
|
save.models |
Whether to save imputation models for imputing new data later on. Default: |
save.vars |
For the purpose of imputing new data, the imputation models for response variables specified in |
verbose |
Verbose setting for mixgb. If |
xgb.params |
A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters. |
nrounds |
The maximum number of boosting iterations for XGBoost. Default: 100 |
early_stopping_rounds |
An integer value |
print_every_n |
Print XGBoost evaluation information at every nth iteration if |
xgboost_verbose |
Verbose setting for XGBoost training: 0 (silent), 1 (print information) and 2 (print additional information). Default: 0 |
... |
Extra arguments to be passed to XGBoost |
Value
If save.models = FALSE
, this function will return a list of m
imputed datasets. If save.models = TRUE
, it will return an object with imputed datasets, saved models and parameters.
Examples
# obtain m multiply datasets without saving models
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
mixgb.data <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10)
# obtain m multiply imputed datasets and save models for imputing new data later on
mixgb.obj <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10, save.models = TRUE)