model_formula {parsnip}R Documentation

Formulas with special terms in tidymodels

Description

In R, formulas provide a compact, symbolic notation to specify model terms. Many modeling functions in R make use of "specials", or nonstandard notations used in formulas. Specials are defined and handled as a special case by a given modeling package. For example, the mgcv package, which provides support for generalized additive models in R, defines a function s() to be in-lined into formulas. It can be used like so:

mgcv::gam(mpg ~ wt + s(disp, k = 5), data = mtcars)

In this example, the s() special defines a smoothing term that the mgcv package knows to look for when preprocessing model input.

The parsnip package can handle most specials without issue. The analogous code for specifying this generalized additive model with the parsnip "mgcv" engine looks like:

gen_additive_mod() %>%
  set_mode("regression") %>%
  set_engine("mgcv") %>%
  fit(mpg ~ wt + s(disp, k = 5), data = mtcars)

However, parsnip is often used in conjunction with the greater tidymodels package ecosystem, which defines its own pre-processing infrastructure and functionality via packages like hardhat and recipes. The specials defined in many modeling packages introduce conflicts with that infrastructure.

To support specials while also maintaining consistent syntax elsewhere in the ecosystem, tidymodels delineates between two types of formulas: preprocessing formulas and model formulas. Preprocessing formulas specify the input variables, while model formulas determine the model structure.

Example

To create the preprocessing formula from the model formula, just remove the specials, retaining references to input variables themselves. For example:

model_formula <- mpg ~ wt + s(disp, k = 5)
preproc_formula <- mpg ~ wt + disp

[Package parsnip version 1.2.1 Index]