brt {mcmcsae} | R Documentation |
Create a model component object for a BART (Bayesian Additive Regression Trees) component in the linear predictor
Description
This function is intended to be used on the right hand side of the
formula
argument to create_sampler
or
generate_data
. It creates a BART term in the
model's linear predictor. To use this model component one needs
to have R package dbarts installed.
Usage
brt(
formula,
X = NULL,
n.trees = 75L,
name = "",
debug = FALSE,
keepTrees = FALSE,
...
)
Arguments
formula |
a formula specifying the predictors to be used in the BART
model component. Variable names are looked up in the data frame
passed as |
X |
a design matrix can be specified directly, as an alternative
to the creation of one based on |
n.trees |
number of trees used in the BART ensemble. |
name |
the name of the model component. This name is used in the output of the
MCMC simulation function |
debug |
if |
keepTrees |
whether to store the trees ensemble for each Monte Carlo draw. This
is required for prediction based on new data. The default is |
... |
parameters passed to |
Value
An object with precomputed quantities and functions for sampling from prior or conditional posterior distributions for this model component. Intended for internal use by other package functions.
References
H.A. Chipman, E.I. Georgea and R.E. McCulloch (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics 4(1), 266-298.
J.H. Friedman (1991). Multivariate adaptive regression splines. The Annals of Statistics 19, 1-67.
Examples
# generate data, based on an example in Friedman (1991)
gendat <- function(n=200L, p=10L, sigma=1) {
x <- matrix(runif(n * p), n, p)
mu <- 10*sin(pi*x[, 1] * x[, 2]) + 20*(x[, 3] - 0.5)^2 + 10*x[, 4] + 5*x[, 5]
y <- mu + sigma * rnorm(n)
data.frame(x=x, mu=mu, y=y)
}
train <- gendat()
test <- gendat(n=25)
# keep trees for later prediction based on new data
sampler <- create_sampler(
y ~ brt(~ . - y, name="bart", keepTrees=TRUE),
sigma.mod=pr_invchisq(df=3, scale=var(train$y)),
data = train
)
sim <- MCMCsim(sampler, n.chain=2, n.iter=700, thin=2,
store.all=TRUE, verbose=FALSE)
(summ <- summary(sim))
plot(train$mu, summ$bart[, "Mean"]); abline(0, 1)
# NB prediction is currently slow
pred <- predict(sim, newdata=test,
iters=sample(seq_len(ndraws(sim)), 100),
show.progress=FALSE
)
(summpred <- summary(pred))
plot(test$mu, summpred[, "Mean"]); abline(0, 1)