ml_pipline_builder {pipeliner} | R Documentation |
Build machine learning pipelines - object oriented API
Description
Building machine learning models often requires pre- and post-transformation of the input and/or response variables, prior to training (or fitting) the models. For example, a model may require training on the logarithm of the response and input variables. As a consequence, fitting and then generating predictions from these models requires repeated application of transformation and inverse-transormation functions, to go from the original input to original output variables (via the model).
Usage
ml_pipline_builder()
Details
This function produces an object in which it is possible to: define transformation and inverse-transformation functions; fit a model on training data; and then generate a prediction (or model-scoring) function that automatically applies the entire pipeline of transformation and inverse-transformation to the inputs and outputs of the inner-model's predicted scores.
Calling ml_pipline_builder()
will return an 'ml_pipeline' object (actually an environment
or closure), whose methods can be accessed as one would access any element of a list. For example,
ml_pipline_builder()$transform_features
will allow you to get or set the
transform_features
function to use the pipeline. The full list of methods for defining
sections of the pipeline (documented elsewhere) are:
-
transform_features
; -
transform_response
; -
inv_transform_response
; and, -
estimate_model
;
The pipeline can be fit, prediction generated and the inner model accessed using the following methods:
-
fit(.data)
; -
predict(.data)
; and, -
model_estimate()
.
Value
An object of class ml_pipeline
.
See Also
transform_features
, transform_response
,
estimate_model
and inv_transform_response
.
Examples
data <- faithful
lm_pipeline <- ml_pipline_builder()
lm_pipeline$transform_features(function(df) {
data.frame(x1 = (df$waiting - mean(df$waiting)) / sd(df$waiting))
})
lm_pipeline$transform_response(function(df) {
data.frame(y = (df$eruptions - mean(df$eruptions)) / sd(df$eruptions))
})
lm_pipeline$inv_transform_response(function(df) {
data.frame(pred_eruptions = df$pred_model * sd(df$eruptions) + mean(df$eruptions))
})
lm_pipeline$estimate_model(function(df) {
lm(y ~ 0 + x1, df)
})
lm_pipeline$fit(data)
head(lm_pipeline$predict(data))
# eruptions waiting x1 pred_model pred_eruptions
# 1 3.600 79 0.5960248 0.5369058 4.100592
# 2 1.800 54 -1.2428901 -1.1196093 2.209893
# 3 3.333 74 0.2282418 0.2056028 3.722452
# 4 2.283 62 -0.6544374 -0.5895245 2.814917
# 5 4.533 85 1.0373644 0.9344694 4.554360
# 6 2.883 55 -1.1693335 -1.0533487 2.285521