mlr_pipeops_missind {mlr3pipelines} | R Documentation |
Add Missing Indicator Columns
Description
Add missing indicator columns ("dummy columns") to the Task
.
Drops original features; should probably be used in combination with PipeOpFeatureUnion
and imputation PipeOp
s (see examples).
Note the affect_columns
is initialized with selector_invert(selector_type(c("factor", "ordered", "character")))
, since missing
values in factorial columns are often indicated by out-of-range imputation (PipeOpImputeOOR
).
Format
R6Class
object inheriting from PipeOpTaskPreprocSimple
/PipeOpTaskPreproc
/PipeOp
.
Construction
PipeOpMissInd$new(id = "missind", param_vals = list())
-
id
::character(1)
Identifier of the resulting object, defaulting to"missind"
. -
param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
State
$state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
, as well as:
-
indicand_cols
::character
Names of columns for which indicator columns are added. If thewhich
parameter is"all"
, this is just the names of all features, otherwise it is the names of all features that had missing values during training.
Parameters
The parameters are the parameters inherited from the PipeOpTaskPreproc
, as well as:
-
which
::character(1)
Determines for which features the indicator columns are added. Can either be"missing_train"
(default), adding indicator columns for each feature that actually has missing values, or"all"
, adding indicator columns for all features. -
type
::character(1)
Determines the type of the newly created columns. Can be one of"factor"
(default),"integer"
,"logical"
,"numeric"
.
Internals
This PipeOp
should cover most cases where "dummy columns" or "missing indicators" are desired. Some edge cases:
If imputation for factorial features is performed and only numeric features should gain missing indicators, the
affect_columns
parameter can be set toselector_type("numeric")
.If missing indicators should only be added for features that have more than a fraction of
x
missing values, thePipeOpRemoveConstants
can be used withaffect_columns = selector_grep("^missing_")
andratio = x
.
Fields
Fields inherited from PipeOpTaskPreproc
/PipeOp
.
Methods
Methods inherited from PipeOpTaskPreproc
/PipeOp
.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_subsample
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
task = tsk("pima")$select(c("insulin", "triceps"))
sum(complete.cases(task$data()))
task$missings()
tail(task$data())
po = po("missind")
new_task = po$train(list(task))[[1]]
tail(new_task$data())
# proper imputation + missing indicators
impgraph = list(
po("imputesample"),
po("missind")
) %>>% po("featureunion")
tail(impgraph$train(task)[[1]]$data())