| mlr_pipeops_vtreat {mlr3pipelines} | R Documentation |
Interface to the vtreat Package
Description
Provides an interface to the vtreat package.
PipeOpVtreat naturally works for classification tasks and regression tasks.
Internally, PipeOpVtreat follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
vtreat::NumericOutcomeTreatment(), vtreat::BinomialOutcomeTreatment(), or vtreat::MultinomialOutcomeTreatment(), followed by calling
vtreat::fit_prepare() on the training data and vtreat::prepare() during predicton.
Format
R6Class object inheriting from PipeOpTaskPreproc/PipeOp.
Construction
PipeOpVreat$new(id = "vtreat", param_vals = list())
-
id::character(1)
Identifier of resulting object, default"vtreat". -
param_vals:: namedlist
List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Defaultlist().
Input and Output Channels
Input and output channels are inherited from PipeOpTaskPreproc.
The output is the input Task with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input Task is returned unaltered.
State
The $state is a named list with the $state elements inherited from PipeOpTaskPreproc, as well as:
-
treatment_plan:: object of classvtreat_pipe_step|NULL
The treatment plan as constructed by vtreat based on the training data, i.e., an object of classtreatment_plan. If vtreat found "no usable vars" and designing the treatment would have failed, this isNULL.
Parameters
The parameters are the parameters inherited from PipeOpTaskPreproc, as well as:
-
recommended::logical(1)
Whether only the "recommended" prepared features should be returned, i.e., non constant variables with a significance value smaller than vtreat's threshold. Initialized toTRUE. -
cols_to_copy::function|Selector
Selectorfunction, takes aTaskas argument and returns acharacter()of features to copy.
SeeSelectorfor example functions. Initialized toselector_none(). -
minFraction::numeric(1)
Minimum frequency a categorical level must have to be converted to an indicator column. -
smFactor::numeric(1)
Smoothing factor for impact coding models. -
rareCount::integer(1)
Allow levels with this count or below to be pooled into a shared rare-level. -
rareSig::numeric(1)
Suppress levels from pooling at this significance value greater. -
collarProb::numeric(1)
What fraction of the data (pseudo-probability) to collar data at ifdoCollar = TRUE. -
doCollar::logical(1)
IfTRUEcollar numeric variables by cutting off after a tail-probability specified bycollarProbduring treatment design. -
codeRestriction::character()
What types of variables to produce. -
customCoders:: namedlist
Map from code names to custom categorical variable encoding functions. -
splitFunction::function
Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split. -
ncross::integer(1)
Integer larger than one, number of cross-validation rounds to design. -
forceSplit::logical(1)
IfTRUEforce cross-validated significance calculations on all variables. -
catScaling::logical(1)
IfTRUEusestats::glm()linkspace, if FALSE usestats::lm()for scaling. -
verbose::logical(1)
IfTRUEprint progress. -
use_paralell::logical(1)
IfTRUEuse parallel methods. -
missingness_imputation::function
Function of signature f(values: numeric, weights: numeric), simple missing value imputer.
Typically, an imputation via aPipeOpshould be preferred, seePipeOpImpute. -
pruneSig::numeric(1)
Suppress variables with significance above this level. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks. -
scale::logical(1)
IfTRUEreplace numeric variables with single variable model regressions ("move to outcome-scale"). These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome. -
varRestriction::list()
List of treated variable names to restrict to. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks. -
trackedValues:: namedlist()
Named list mapping variables to know values, allows warnings upon novel level appearances (seevtreat::track_values()). Only effects [regression tasksmlr3::TaskRegr and binary classification tasks. -
y_dependent_treatments::character()
Character what treatment types to build per-outcome level. Only effects multiclass classification tasks. -
imputation_map:: namedlist
List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.
Typically, an imputation via aPipeOpis to be preferred, seePipeOpImpute.
For more information, see vtreat::regression_parameters(), vtreat::classification_parameters(), or vtreat::multinomial_parameters().
Internals
Follows vtreat's fit/prepare interface. See vtreat::NumericOutcomeTreatment(), vtreat::BinomialOutcomeTreatment(),
vtreat::MultinomialOutcomeTreatment(), vtreat::fit_prepare() and vtreat::prepare().
Methods
Only methods inherited from PipeOpTaskPreproc/PipeOp.
See Also
https://mlr-org.com/pipeops.html
Other PipeOps:
PipeOp,
PipeOpEnsemble,
PipeOpImpute,
PipeOpTargetTrafo,
PipeOpTaskPreproc,
PipeOpTaskPreprocSimple,
mlr_pipeops,
mlr_pipeops_boxcox,
mlr_pipeops_branch,
mlr_pipeops_chunk,
mlr_pipeops_classbalancing,
mlr_pipeops_classifavg,
mlr_pipeops_classweights,
mlr_pipeops_colapply,
mlr_pipeops_collapsefactors,
mlr_pipeops_colroles,
mlr_pipeops_copy,
mlr_pipeops_datefeatures,
mlr_pipeops_encode,
mlr_pipeops_encodeimpact,
mlr_pipeops_encodelmer,
mlr_pipeops_featureunion,
mlr_pipeops_filter,
mlr_pipeops_fixfactors,
mlr_pipeops_histbin,
mlr_pipeops_ica,
mlr_pipeops_imputeconstant,
mlr_pipeops_imputehist,
mlr_pipeops_imputelearner,
mlr_pipeops_imputemean,
mlr_pipeops_imputemedian,
mlr_pipeops_imputemode,
mlr_pipeops_imputeoor,
mlr_pipeops_imputesample,
mlr_pipeops_kernelpca,
mlr_pipeops_learner,
mlr_pipeops_missind,
mlr_pipeops_modelmatrix,
mlr_pipeops_multiplicityexply,
mlr_pipeops_multiplicityimply,
mlr_pipeops_mutate,
mlr_pipeops_nmf,
mlr_pipeops_nop,
mlr_pipeops_ovrsplit,
mlr_pipeops_ovrunite,
mlr_pipeops_pca,
mlr_pipeops_proxy,
mlr_pipeops_quantilebin,
mlr_pipeops_randomprojection,
mlr_pipeops_randomresponse,
mlr_pipeops_regravg,
mlr_pipeops_removeconstants,
mlr_pipeops_renamecolumns,
mlr_pipeops_replicate,
mlr_pipeops_scale,
mlr_pipeops_scalemaxabs,
mlr_pipeops_scalerange,
mlr_pipeops_select,
mlr_pipeops_smote,
mlr_pipeops_spatialsign,
mlr_pipeops_subsample,
mlr_pipeops_targetinvert,
mlr_pipeops_targetmutate,
mlr_pipeops_targettrafoscalerange,
mlr_pipeops_textvectorizer,
mlr_pipeops_threshold,
mlr_pipeops_tunethreshold,
mlr_pipeops_unbranch,
mlr_pipeops_updatetarget,
mlr_pipeops_yeojohnson
Examples
library("mlr3")
set.seed(2020)
make_data <- function(nrows) {
d <- data.frame(x = 5 * rnorm(nrows))
d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows)
d[4:10, "x"] = NA # introduce NAs
d["xc"] = paste0("level_", 5 * round(d$y / 5, 1))
d["x2"] = rnorm(nrows)
d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level
return(d)
}
task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y")
pop = PipeOpVtreat$new()
pop$train(list(task))