mlr_pipeops_subsample {mlr3pipelines} | R Documentation |
Subsampling
Description
Subsamples a Task
to use a fraction of the rows.
Sampling happens only during training phase. Subsampling a Task
may be
beneficial for training time at possibly (depending on original Task
size)
negligible cost of predictive performance.
Format
R6Class
object inheriting from PipeOpTaskPreproc
/PipeOp
.
Construction
PipeOpSubsample$new(id = "subsample", param_vals = list())
-
id
::character(1)
Identifier of the resulting object, default"subsample"
-
param_vals
:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist()
.
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc
.
The output during training is the input Task
with added or removed rows according to the sampling.
The output during prediction is the unchanged input.
State
The $state
is a named list
with the $state
elements inherited from PipeOpTaskPreproc
.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc
; however, the affect_columns
parameter is not present. Further parameters are:
-
frac
::numeric(1)
Fraction of rows in theTask
to keep. May only be greater than 1 ifreplace
isTRUE
. Initialized to(1 - exp(-1)) == 0.6321
. -
stratify
::logical(1)
Should the subsamples be stratified by target? Initialized toFALSE
. May only beTRUE
forTaskClassif
input. -
replace
::logical(1)
Sample with replacement? Initialized toFALSE
.
Internals
Uses task$filter()
to remove rows. If replace
is TRUE
and identical rows are added, then the task$row_roles$use
can not be used
to duplicate rows because of [inaudible]; instead the task$rbind()
function is used, and
a new data.table
is attached that contains all rows that are being duplicated exactly as many times as they are being added.
Fields
Only fields inherited from PipeOpTaskPreproc
/PipeOp
.
Methods
Only methods inherited from PipeOpTaskPreproc
/PipeOp
.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp
,
PipeOpEnsemble
,
PipeOpImpute
,
PipeOpTargetTrafo
,
PipeOpTaskPreproc
,
PipeOpTaskPreprocSimple
,
mlr_pipeops
,
mlr_pipeops_boxcox
,
mlr_pipeops_branch
,
mlr_pipeops_chunk
,
mlr_pipeops_classbalancing
,
mlr_pipeops_classifavg
,
mlr_pipeops_classweights
,
mlr_pipeops_colapply
,
mlr_pipeops_collapsefactors
,
mlr_pipeops_colroles
,
mlr_pipeops_copy
,
mlr_pipeops_datefeatures
,
mlr_pipeops_encode
,
mlr_pipeops_encodeimpact
,
mlr_pipeops_encodelmer
,
mlr_pipeops_featureunion
,
mlr_pipeops_filter
,
mlr_pipeops_fixfactors
,
mlr_pipeops_histbin
,
mlr_pipeops_ica
,
mlr_pipeops_imputeconstant
,
mlr_pipeops_imputehist
,
mlr_pipeops_imputelearner
,
mlr_pipeops_imputemean
,
mlr_pipeops_imputemedian
,
mlr_pipeops_imputemode
,
mlr_pipeops_imputeoor
,
mlr_pipeops_imputesample
,
mlr_pipeops_kernelpca
,
mlr_pipeops_learner
,
mlr_pipeops_missind
,
mlr_pipeops_modelmatrix
,
mlr_pipeops_multiplicityexply
,
mlr_pipeops_multiplicityimply
,
mlr_pipeops_mutate
,
mlr_pipeops_nmf
,
mlr_pipeops_nop
,
mlr_pipeops_ovrsplit
,
mlr_pipeops_ovrunite
,
mlr_pipeops_pca
,
mlr_pipeops_proxy
,
mlr_pipeops_quantilebin
,
mlr_pipeops_randomprojection
,
mlr_pipeops_randomresponse
,
mlr_pipeops_regravg
,
mlr_pipeops_removeconstants
,
mlr_pipeops_renamecolumns
,
mlr_pipeops_replicate
,
mlr_pipeops_scale
,
mlr_pipeops_scalemaxabs
,
mlr_pipeops_scalerange
,
mlr_pipeops_select
,
mlr_pipeops_smote
,
mlr_pipeops_spatialsign
,
mlr_pipeops_targetinvert
,
mlr_pipeops_targetmutate
,
mlr_pipeops_targettrafoscalerange
,
mlr_pipeops_textvectorizer
,
mlr_pipeops_threshold
,
mlr_pipeops_tunethreshold
,
mlr_pipeops_unbranch
,
mlr_pipeops_updatetarget
,
mlr_pipeops_vtreat
,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
pos = mlr_pipeops$get("subsample", param_vals = list(frac = 0.7, stratify = TRUE))
pos$train(list(tsk("iris")))