model_frame {hardhat} | R Documentation |
Construct a model frame
Description
model_frame()
is a stricter version of stats::model.frame()
. There are
a number of differences, with the main being that rows are never dropped
and the return value is a list with the frame and terms separated into
two distinct objects.
Usage
model_frame(formula, data)
Arguments
formula |
A formula or terms object representing the terms of the model frame. |
data |
A data frame or matrix containing the terms of |
Details
The following explains the rationale for some of the difference in arguments
compared to stats::model.frame()
:
-
subset
: Not allowed because the number of rows before and aftermodel_frame()
has been run should always be the same. -
na.action
: Not allowed and is forced to"na.pass"
because the number of rows before and aftermodel_frame()
has been run should always be the same. -
drop.unused.levels
: Not allowed because it seems inconsistent fordata
and the result ofmodel_frame()
to ever have the same factor column but with different levels, unless specified thoughoriginal_levels
. If this is required, it should be done through a recipe step explicitly. -
xlev
: Not allowed because this check should have been done ahead of time. Usescream()
to check the integrity ofdata
against a training set if that is required. -
...
: Not exposed because offsets are handled separately, and it is not necessary to pass weights here any more because rows are never dropped (so weights don't have to be subset alongside the rest of the design matrix). If other non-predictor columns are required, use the "roles" features of recipes.
It is important to always use the results of model_frame()
with
model_matrix()
rather than stats::model.matrix()
because the tibble
in the result of model_frame()
does not have a terms object attached.
If model.matrix(<terms>, <tibble>)
is called directly, then a call to
model.frame()
will be made automatically, which can give faulty results.
Value
A named list with two elements:
-
"data"
: A tibble containing the model frame. -
"terms"
: A terms object containing the terms for the model frame.
Examples
# ---------------------------------------------------------------------------
# Example usage
framed <- model_frame(Species ~ Sepal.Width, iris)
framed$data
framed$terms
# ---------------------------------------------------------------------------
# Missing values never result in dropped rows
iris2 <- iris
iris2$Sepal.Width[1] <- NA
framed2 <- model_frame(Species ~ Sepal.Width, iris2)
head(framed2$data)
nrow(framed2$data) == nrow(iris2)