mlr_pipeops_datefeatures {mlr3pipelines}R Documentation

Preprocess Date Features

Description

Based on POSIXct columns of the data, a set of date related features is computed and added to the feature set of the output task. If no POSIXct column is found, the original task is returned unaltered. This functionality is based on the add_datepart() and add_cyclic_datepart() functions from the fastai library. If operation on only particular POSIXct columns is requested, use the affect_columns parameter inherited from PipeOpTaskPreprocSimple.

If cyclic = TRUE, cyclic features are computed for the features "month", "week_of_year", "day_of_year", "day_of_month", "day_of_week", "hour", "minute" and "second". This means that for each feature x, two additional features are computed, namely the sine and cosine transformation of 2 * pi * x / max_x (here max_x is the largest possible value the feature could take on + 1, assuming the lowest possible value is given by 0, e.g., for hours from 0 to 23, this is 24). This is useful to respect the cyclical nature of features such as seconds, i.e., second 21 and second 22 are one second apart, but so are second 60 and second 1 of the next minute.

Format

R6Class object inheriting from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Construction

PipeOpDateFeatures$new(id = "datefeatures", param_vals = list())

Input and Output Channels

Input and output channels are inherited from PipeOpTaskPreprocSimple.

The output is the input Task with date-related features computed and added to the feature set of the output task and the POSIXct columns of the data removed from the feature set (depending on the value of keep_date_var).

State

The ⁠$state⁠ is a named list with the ⁠$state⁠ elements inherited from PipeOpTaskPreprocSimple.

Parameters

The parameters are the parameters inherited from PipeOpTaskPreprocSimple, as well as:

Internals

The cyclic feature transformation always assumes that values range from 0, so some values (e.g. day of the month) are shifted before sine/cosine transform.

Methods

Only methods inherited from PipeOpTaskPreprocSimple/PipeOpTaskPreproc/PipeOp.

Fields

Only fields inherited from PipeOpTaskPreproc/PipeOp.

See Also

https://mlr-org.com/pipeops.html

Other PipeOps: PipeOp, PipeOpEnsemble, PipeOpImpute, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_pipeops, mlr_pipeops_boxcox, mlr_pipeops_branch, mlr_pipeops_chunk, mlr_pipeops_classbalancing, mlr_pipeops_classifavg, mlr_pipeops_classweights, mlr_pipeops_colapply, mlr_pipeops_collapsefactors, mlr_pipeops_colroles, mlr_pipeops_copy, mlr_pipeops_encode, mlr_pipeops_encodeimpact, mlr_pipeops_encodelmer, mlr_pipeops_featureunion, mlr_pipeops_filter, mlr_pipeops_fixfactors, mlr_pipeops_histbin, mlr_pipeops_ica, mlr_pipeops_imputeconstant, mlr_pipeops_imputehist, mlr_pipeops_imputelearner, mlr_pipeops_imputemean, mlr_pipeops_imputemedian, mlr_pipeops_imputemode, mlr_pipeops_imputeoor, mlr_pipeops_imputesample, mlr_pipeops_kernelpca, mlr_pipeops_learner, mlr_pipeops_missind, mlr_pipeops_modelmatrix, mlr_pipeops_multiplicityexply, mlr_pipeops_multiplicityimply, mlr_pipeops_mutate, mlr_pipeops_nmf, mlr_pipeops_nop, mlr_pipeops_ovrsplit, mlr_pipeops_ovrunite, mlr_pipeops_pca, mlr_pipeops_proxy, mlr_pipeops_quantilebin, mlr_pipeops_randomprojection, mlr_pipeops_randomresponse, mlr_pipeops_regravg, mlr_pipeops_removeconstants, mlr_pipeops_renamecolumns, mlr_pipeops_replicate, mlr_pipeops_scale, mlr_pipeops_scalemaxabs, mlr_pipeops_scalerange, mlr_pipeops_select, mlr_pipeops_smote, mlr_pipeops_spatialsign, mlr_pipeops_subsample, mlr_pipeops_targetinvert, mlr_pipeops_targetmutate, mlr_pipeops_targettrafoscalerange, mlr_pipeops_textvectorizer, mlr_pipeops_threshold, mlr_pipeops_tunethreshold, mlr_pipeops_unbranch, mlr_pipeops_updatetarget, mlr_pipeops_vtreat, mlr_pipeops_yeojohnson

Examples

library("mlr3")
dat = iris
set.seed(1)
dat$date = sample(seq(as.POSIXct("2020-02-01"), to = as.POSIXct("2020-02-29"), by = "hour"),
 size = 150L)
task = TaskClassif$new("iris_date", backend = dat, target = "Species")
pop = po("datefeatures", param_vals = list(cyclic = FALSE, minute = FALSE, second = FALSE))
pop$train(list(task))
pop$state

[Package mlr3pipelines version 0.6.0 Index]