R: Build machine learning pipelines

ml_pipline_builder {pipeliner}

R Documentation

Build machine learning pipelines - object oriented API

Description

Building machine learning models often requires pre- and post-transformation of the input and/or response variables, prior to training (or fitting) the models. For example, a model may require training on the logarithm of the response and input variables. As a consequence, fitting and then generating predictions from these models requires repeated application of transformation and inverse-transormation functions, to go from the original input to original output variables (via the model).

Usage

ml_pipline_builder()

Details

This function produces an object in which it is possible to: define transformation and inverse-transformation functions; fit a model on training data; and then generate a prediction (or model-scoring) function that automatically applies the entire pipeline of transformation and inverse-transformation to the inputs and outputs of the inner-model's predicted scores.

Calling ml_pipline_builder() will return an 'ml_pipeline' object (actually an environment or closure), whose methods can be accessed as one would access any element of a list. For example, ml_pipline_builder()$transform_features will allow you to get or set the transform_features function to use the pipeline. The full list of methods for defining sections of the pipeline (documented elsewhere) are:

transform_features;
transform_response;
inv_transform_response; and,
estimate_model;

The pipeline can be fit, prediction generated and the inner model accessed using the following methods:

fit(.data);
predict(.data); and,
model_estimate().

Value

An object of class ml_pipeline.

Examples

data <- faithful

lm_pipeline <- ml_pipline_builder()

lm_pipeline$transform_features(function(df) {
  data.frame(x1 = (df$waiting - mean(df$waiting)) / sd(df$waiting))
})

lm_pipeline$transform_response(function(df) {
  data.frame(y = (df$eruptions - mean(df$eruptions)) / sd(df$eruptions))
})

lm_pipeline$inv_transform_response(function(df) {
  data.frame(pred_eruptions = df$pred_model * sd(df$eruptions) + mean(df$eruptions))
})

lm_pipeline$estimate_model(function(df) {
  lm(y ~ 0 + x1, df)
})

lm_pipeline$fit(data)
head(lm_pipeline$predict(data))
#    eruptions waiting         x1 pred_model pred_eruptions
#  1     3.600      79  0.5960248  0.5369058       4.100592
#  2     1.800      54 -1.2428901 -1.1196093       2.209893
#  3     3.333      74  0.2282418  0.2056028       3.722452
#  4     2.283      62 -0.6544374 -0.5895245       2.814917
#  5     4.533      85  1.0373644  0.9344694       4.554360
#  6     2.883      55 -1.1693335 -1.0533487       2.285521