| mlr_pipeops_imputeoor {mlr3pipelines} | R Documentation |
Out of Range Imputation
Description
Impute factorial features by adding a new level ".MISSING".
Impute numerical features by constant values shifted below the minimum or above the maximum by
using min(x) - offset - multiplier * diff(range(x)) or
max(x) + offset + multiplier * diff(range(x)).
This type of imputation is especially sensible in the context of tree-based methods, see also Ding & Simonoff (2010).
Format
R6Class object inheriting from PipeOpImpute/PipeOp.
Construction
PipeOpImputeOOR$new(id = "imputeoor", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"imputeoor". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpImpute.
The output is the input Task with all affected features having missing values imputed as described above.
State
The $state is a named list with the $state elements inherited from PipeOpImpute.
The $state$model contains either ".MISSING" used for character and factor (also
ordered) features or numeric(1) indicating the constant value used for imputation of
integer and numeric features.
Parameters
The parameters are the parameters inherited from PipeOpImpute, as well as:
-
min::logical(1)
Shouldintegerandnumericfeatures be shifted below the minimum? Initialized to TRUE. If FALSE they are shifted above the maximum. See also the description above. -
offset::numeric(1)
Numerical non-negative offset as used in the description above forintegerandnumericfeatures. Initialized to 1. -
multiplier::numeric(1)
Numerical non-negative multiplier as used in the description above forintegerandnumericfeatures. Initialized to 1.
Internals
Adds an explicit new level() to factor and ordered features, but not to character features.
For integer and numeric features uses the min, max, diff and range functions.
integer and numeric features that are entirely NA are imputed as 0.
Methods
Only methods inherited from PipeOpImpute/PipeOp.
References
Ding Y, Simonoff JS (2010). “An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data.” Journal of Machine Learning Research, 11(6), 131-170. https://jmlr.org/papers/v11/ding10a.html.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_vtreat,
mlr_pipeops_yeojohnson
Other Imputation PipeOps:
PipeOpImpute,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputesample
Examples
library("mlr3")
set.seed(2409)
data = tsk("pima")$data()
data$y = factor(c(NA, sample(letters, size = 766, replace = TRUE), NA))
data$z = ordered(c(NA, sample(1:10, size = 767, replace = TRUE)))
task = TaskClassif$new("task", backend = data, target = "diabetes")
task$missings()
po = po("imputeoor")
new_task = po$train(list(task = task))[[1]]
new_task$missings()
new_task$data()