brt {mcmcsae}R Documentation

Create a model component object for a BART (Bayesian Additive Regression Trees) component in the linear predictor

Description

This function is intended to be used on the right hand side of the formula argument to create_sampler or generate_data. It creates a BART term in the model's linear predictor. To use this model component one needs to have R package dbarts installed.

Usage

brt(
  formula,
  X = NULL,
  n.trees = 75L,
  name = "",
  debug = FALSE,
  keepTrees = FALSE,
  ...
)

Arguments

formula

a formula specifying the predictors to be used in the BART model component. Variable names are looked up in the data frame passed as data argument to create_sampler or generate_data, or in environment(formula).

X

a design matrix can be specified directly, as an alternative to the creation of one based on formula. If X is specified formula is ignored.

n.trees

number of trees used in the BART ensemble.

name

the name of the model component. This name is used in the output of the MCMC simulation function MCMCsim. By default the name will be 'bart' with the number of the model term attached.

debug

if TRUE a breakpoint is set at the beginning of the posterior draw function associated with this model component. Mainly intended for developers.

keepTrees

whether to store the trees ensemble for each Monte Carlo draw. This is required for prediction based on new data. The default is FALSE to save memory.

...

parameters passed to dbarts.

Value

An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.

References

H.A. Chipman, E.I. Georgea and R.E. McCulloch (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266-298.

J.H. Friedman (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1-67.

Examples

# generate data, based on an example in Friedman (1991)
gendat <- function(n=200L, p=10L, sigma=1) {
  x <- matrix(runif(n * p), n, p)
  mu <- 10*sin(pi*x[, 1] * x[, 2]) + 20*(x[, 3] - 0.5)^2 + 10*x[, 4] + 5*x[, 5]
  y <- mu + sigma * rnorm(n)
  data.frame(x=x, mu=mu, y=y)
}

train <- gendat()
test <- gendat(n=25)

# keep trees for later prediction based on new data
sampler <- create_sampler(
  y ~ brt(~ . - y, name="bart", keepTrees=TRUE),
  sigma.mod=pr_invchisq(df=3,  scale=var(train$y)),
  data = train
)
sim <- MCMCsim(sampler, n.chain=2, n.iter=700, thin=2,
  store.all=TRUE, verbose=FALSE)
(summ <- summary(sim))
plot(train$mu, summ$bart[, "Mean"]); abline(0, 1)
# NB prediction is currently slow

pred <- predict(sim, newdata=test,
  iters=sample(seq_len(ndraws(sim)), 100),
  show.progress=FALSE
)
(summpred <- summary(pred))
plot(test$mu, summpred[, "Mean"]); abline(0, 1)



[Package mcmcsae version 0.7.7 Index]