bart {parsnip} | R Documentation |
Bayesian additive regression trees (BART)
Description
bart()
defines a tree ensemble model that uses Bayesian analysis to
assemble the ensemble. This function can fit classification and regression
models.
There are different ways to fit this model, and the method of estimation is chosen by setting the model engine. The engine-specific pages for this model are listed below.
¹ The default engine.
More information on how parsnip is used for modeling is at https://www.tidymodels.org/.
Usage
bart(
mode = "unknown",
engine = "dbarts",
trees = NULL,
prior_terminal_node_coef = NULL,
prior_terminal_node_expo = NULL,
prior_outcome_range = NULL
)
Arguments
mode |
A single character string for the prediction outcome mode. Possible values for this model are "unknown", "regression", or "classification". |
engine |
A single character string specifying what computational engine to use for fitting. |
trees |
An integer for the number of trees contained in the ensemble. |
prior_terminal_node_coef |
A coefficient for the prior probability that a node is a terminal node. Values are usually between 0 and one with a default of 0.95. This affects the baseline probability; smaller numbers make the probabilities larger overall. See Details below. |
prior_terminal_node_expo |
An exponent in the prior probability that a node is a terminal node. Values are usually non-negative with a default of 2 This affects the rate that the prior probability decreases as the depth of the tree increases. Larger values make deeper trees less likely. |
prior_outcome_range |
A positive value that defines the width of a prior that the predicted outcome is within a certain range. For regression it is related to the observed range of the data; the prior is the number of standard deviations of a Gaussian distribution defined by the observed range of the data. For classification, it is defined as the range of +/-3 (assumed to be on the logit scale). The default value is 2. |
Details
The prior for the terminal node probability is expressed as
prior = a * (1 + d)^(-b)
where d
is the depth of the node, a
is
prior_terminal_node_coef
and b
is prior_terminal_node_expo
. See the
Examples section below for an example graph of the prior probability of a
terminal node for different values of these parameters.
This function only defines what type of model is being fit. Once an engine
is specified, the method to fit the model is also defined. See
set_engine()
for more on setting the engine, including how to set engine
arguments.
The model is not trained or fit until the fit()
function is used
with the data.
Each of the arguments in this function other than mode
and engine
are
captured as quosures. To pass values
programmatically, use the injection operator like so:
value <- 1 bart(argument = !!value)
References
https://www.tidymodels.org, Tidy Modeling with R, searchable table of parsnip models
See Also
fit()
, set_engine()
, update()
, dbarts engine details
Examples
show_engines("bart")
bart(mode = "regression", trees = 5)
# ------------------------------------------------------------------------------
# Examples for terminal node prior
library(ggplot2)
library(dplyr)
prior_test <- function(coef = 0.95, expo = 2, depths = 1:10) {
tidyr::crossing(coef = coef, expo = expo, depth = depths) %>%
mutate(
`terminial node prior` = coef * (1 + depth)^(-expo),
coef = format(coef),
expo = format(expo))
}
prior_test(coef = c(0.05, 0.5, .95), expo = c(1/2, 1, 2)) %>%
ggplot(aes(depth, `terminial node prior`, col = coef)) +
geom_line() +
geom_point() +
facet_wrap(~ expo)