descriptors {parsnip} | R Documentation |
Data Set Characteristics Available when Fitting Models
Description
When using the fit()
functions there are some
variables that will be available for use in arguments. For
example, if the user would like to choose an argument value
based on the current number of rows in a data set, the .obs()
function can be used. See Details below.
Usage
.cols()
.preds()
.obs()
.lvls()
.facts()
.x()
.y()
.dat()
Details
Existing functions:
-
.obs()
: The current number of rows in the data set. -
.preds()
: The number of columns in the data set that is associated with the predictors prior to dummy variable creation. -
.cols()
: The number of predictor columns available after dummy variables are created (if any). -
.facts()
: The number of factor predictors in the data set. -
.lvls()
: If the outcome is a factor, this is a table with the counts for each level (andNA
otherwise). -
.x()
: The predictors returned in the format given. Either a data frame or a matrix. -
.y()
: The known outcomes returned in the format given. Either a vector, matrix, or data frame. -
.dat()
: A data frame containing all of the predictors and the outcomes. Iffit_xy()
was used, the outcomes are attached as the column,..y
.
For example, if you use the model formula circumference ~ .
with the
built-in Orange
data, the values would be
.preds() = 2 (the 2 remaining columns in `Orange`) .cols() = 5 (1 numeric column + 4 from Tree dummy variables) .obs() = 35 .lvls() = NA (no factor outcome) .facts() = 1 (the Tree predictor) .y() = <vector> (circumference as a vector) .x() = <data.frame> (The other 2 columns as a data frame) .dat() = <data.frame> (The full data set)
If the formula Tree ~ .
were used:
.preds() = 2 (the 2 numeric columns in `Orange`) .cols() = 2 (same) .obs() = 35 .lvls() = c("1" = 7, "2" = 7, "3" = 7, "4" = 7, "5" = 7) .facts() = 0 .y() = <vector> (Tree as a vector) .x() = <data.frame> (The other 2 columns as a data frame) .dat() = <data.frame> (The full data set)
To use these in a model fit, pass them to a model specification.
The evaluation is delayed until the time when the
model is run via fit()
(and the variables listed above are available).
For example:
library(modeldata) data("lending_club") rand_forest(mode = "classification", mtry = .cols() - 2)
When no descriptors are found, the computation of the descriptor values is not executed.