bart {dbarts} | R Documentation |
Bayesian Additive Regression Trees
Description
BART is a Bayesian “sum-of-trees” model in which each tree is constrained by a prior to be a weak learner.
For numeric response
y = f(x) + \epsilon
, where\epsilon \sim N(0, \sigma^2)
.For binary response
y
,P(Y = 1 \mid x) = \Phi(f(x))
, where\Phi
denotes the standard normal cdf (probit link).
Usage
bart(
x.train, y.train, x.test = matrix(0.0, 0, 0),
sigest = NA, sigdf = 3, sigquant = 0.90,
k = 2.0,
power = 2.0, base = 0.95, splitprobs = 1 / numvars,
binaryOffset = 0.0, weights = NULL,
ntree = 200,
ndpost = 1000, nskip = 100,
printevery = 100, keepevery = 1, keeptrainfits = TRUE,
usequants = FALSE, numcut = 100, printcutoffs = 0,
verbose = TRUE, nchain = 1, nthread = 1, combinechains = TRUE,
keeptrees = FALSE, keepcall = TRUE, sampleronly = FALSE,
seed = NA_integer_,
proposalprobs = NULL,
keepsampler = keeptrees)
bart2(
formula, data, test, subset, weights, offset, offset.test = offset,
sigest = NA_real_, sigdf = 3.0, sigquant = 0.90,
k = NULL,
power = 2.0, base = 0.95, split.probs = 1 / num.vars,
n.trees = 75L,
n.samples = 500L, n.burn = 500L,
n.chains = 4L, n.threads = min(dbarts::guessNumCores(), n.chains),
combineChains = FALSE,
n.cuts = 100L, useQuantiles = FALSE,
n.thin = 1L, keepTrainingFits = TRUE,
printEvery = 100L, printCutoffs = 0L,
verbose = TRUE, keepTrees = FALSE,
keepCall = TRUE, samplerOnly = FALSE,
seed = NA_integer_,
proposal.probs = NULL,
keepSampler = keepTrees,
...)
## S3 method for class 'bart'
plot(
x,
plquants = c(0.05, 0.95), cols = c('blue', 'black'),
...)
## S3 method for class 'bart'
predict(
object, newdata, offset, weights,
type = c("ev", "ppd", "bart"),
combineChains = TRUE, ...)
extract(object, ...)
## S3 method for class 'bart'
extract(
object,
type = c("ev", "ppd", "bart", "trees"),
sample = c("train", "test"),
combineChains = TRUE, ...)
## S3 method for class 'bart'
fitted(
object,
type = c("ev", "ppd", "bart"),
sample = c("train", "test"),
...)
## S3 method for class 'bart'
residuals(object, ...)
Arguments
x.train |
Explanatory variables for training (in sample) data. May be a matrix or a data frame, with rows corresponding to observations and columns to variables. If a variable is a factor in a data frame, it is replaced with dummies. Note that |
y.train |
Dependent variable for training (in sample) data. If |
x.test |
Explanatory variables for test (out of sample) data. Should have same column structure as |
sigest |
For continuous response models, an estimate of the error variance, |
sigdf |
Degrees of freedom for error variance prior. Not applicable when |
sigquant |
The quantile of the error variance prior that the rough estimate ( |
k |
For numeric |
power |
Power parameter for tree prior. |
base |
Base parameter for tree prior. |
splitprobs , split.probs |
Prior and transition probabilities of variables used to generate splits. Can be missing/empty/ |
binaryOffset |
Used for binary |
weights |
An optional vector of weights to be used in the fitting process. When present, BART fits a model with observations |
ntree , n.trees |
The number of trees in the sum-of-trees formulation. |
ndpost , n.samples |
The number of posterior draws after burn in, |
nskip , n.burn |
Number of MCMC iterations to be treated as burn in. |
printevery , printEvery |
As the MCMC runs, a message is printed every |
keepevery , n.thin |
Every |
keeptrainfits , keepTrainingFits |
If |
usequants , useQuantiles |
When |
numcut , n.cuts |
The maximum number of possible values used in decision rules (see |
printcutoffs , printCutoffs |
The number of cutoff rules to printed to screen before the MCMC is run. Given a single integer, the same value will be used for all variables. If 0, nothing is printed. |
verbose |
Logical; if |
nchain , n.chains |
Integer specifying how many independent tree sets and fits should be calculated. |
nthread , n.threads |
Integer specifying how many threads to use. Depending on the CPU architecture, using more than the number of chains can degrade performance for small/medium data sets. As such some calculations may be executed single threaded regardless. |
combinechains , combineChains |
Logical; if |
keeptrees , keepTrees |
Logical; must be |
keepcall , keepCall |
Logical; if |
seed |
Optional integer specifying the desired pRNG seed. It should not be needed when running single-threaded - |
proposalprobs , proposal.probs |
Named numeric vector or |
keepsampler , keepSampler |
Logical that can be used to save the underlying |
formula |
The same as |
data |
The same as |
test |
The same as |
subset |
A vector of logicals or indicies used to subset of the data. Can be missing. |
offset |
The same as |
offset.test |
A vector of offsets to be used with test data, in case it is different than the training offset. If |
object |
An object of class |
newdata |
Test data for prediction. Obeys all the same rules as |
sampleronly , samplerOnly |
Builds the sampler from its arguments and returns it without running it. Useful to use the |
x |
Object of class |
plquants |
In the plots, beliefs about |
cols |
Vector of two colors. First color is used to plot the median of |
type |
The quantity to be returned by generic functions. Options are |
sample |
Either |
... |
Additional arguments passed on to |
Details
BART is an Bayesian MCMC method. At each MCMC interation, we produce a draw from the joint posterior (f, \sigma) \mid (x, y)
in the numeric y
case and just f
in the binary y
case.
Thus, unlike a lot of other modeling methods in R, bart
does not produce a single model object from which fits and summaries may be extracted. The output consists of values f^*(x)
(and \sigma^*
in the numeric case) where * denotes a particular draw. The x
is either a row from the training data (x.train
) or the test data (x.test
).
Decision Rules
Decision rules for any tree are of the form x \le c
vs. x > c
for each ‘x
’ corresponding to a column of x.train
. usequants
determines the means by which the set of possible c
is determined. If usequants
is TRUE
, then the c
are a subset of the values interpolated half-way between the unique, sorted values obtained from the corresponding column of x.train
. If usequants
is FALSE
, the cutoffs are equally spaced across the range of values taken on by the corresponding column of x.train
.
The number of possible values of c
is determined by numcut
. If usequants
is FALSE
, numcut
equally spaced cutoffs are used covering the range of values in the corresponding column of x.train
. If usequants
is TRUE
, then for a variable the minimum of numcut
and one less than the number of unique elements for that variable are used.
End-node prior parameter k
The amount of shrinkage of the node parameters is controlled by k
. k
can be given as either a fixed, positive number, or as any value that can be used to build a supported hyperprior. At present, only \chi_\nu s
priors are supported, where \nu
is a degrees of freedom and s
is a scale. Both values must be positive, however the scale can be infinite which yields an improper prior, which is interpretted as just the polynomial part of the distribution. If nu
is 1 and s
is \infty
, the prior is “flat”.
For BART on binary outcomes, the degree of overfitting can be highly sensitive to k
so it is encouraged to consider a number of values. The default hyperprior for binary BART, chi(1.25, Inf)
, has been shown to work well in a large number of datasets, however crossvalidation may be helpful. Running for a short time with a flat prior may be helpful to see the range of values of k
that are consistent with the data.
Generics
bart
and rbart_vi
support fitted
to return the posterior mean of a predicted quantity, as well as predict
to return a set of posterior samples for a different sample. In addition, the extract
generic can be used to obtain the posterior samples for the training data or test data supplied during the initial fit.
Using predict
with a bart
object requires that it be fitted with the option keeptrees
/keepTrees
as TRUE
. Keeping the trees for a fit can require a sizeable amount of memory and is off by default.
All generics return values on the scale of expected value of the response by default. This means that predict
, extract
, and fitted
for binary outcomes return probabilities unless specifically the sum-of-trees component is requested (type = "bart"
). This is in contrast to yhat.train
/yhat.test
that are returned with the fitted model.
Saving
save
ing and load
ing fitted BART objects for use with predict
requires that R's serialization mechanism be able to access the underlying trees, in addition to being fit with keeptrees
/keepTrees
as TRUE
. For memory purposes, the trees are not stored as R objects unless specifically requested. To do this, one must “touch” the sampler's state object before saving, e.g. for a fitted object bartFit
, execute invisible(bartFit$fit$state)
.
Reproducibility
Behavior differs when running multi- and single-threaded, as the pseudo random number generators (pRNG) used by R are not thread safe. When single-threaded, R's built-in generator is used; if set at the start, the global .Random.seed
will be used and its value updated as samples are drawn. When multi-threaded, the default behavior is to draw new random seeds for each thread using the clock and use thread-specific pRNGs.
This behavior can be modified by setting seed
, or by using ...
to pass arguments to dbartsControl
. For the single-threaded case, a new pRNG is built using that seed that is separate from R's native generator. As such, the global state will not be modified by subsequent calls to the generator. For multi-threaded, the seeds for threads are drawn sequentially using the supplied seed, and will again be separate from R's native generator.
Consequently, the seed
argument is not needed when running single-threaded - set.seed
will suffice. However, when multi-threaded the seed
argument can be used to obtain reproducible results.
Extracting Trees
When a model is fit with keeptrees
(bart
) or keepTrees
(bart2
) equal to TRUE
, the generic extract
can be used to retrieve a data frame containing the tree fit information. In this case, extract
will accept the additional, optional arguments: chainNums
, sampleNums
, and treeNums
. Each should be an integer vector detailing the desired trees to be returned.
The result of extract
will be a data frame with columns:
-
sample
,chain
,tree
- index variables -
n
- number of observations in node -
var
- either the index of the variable used for splitting or -1 if the node is a leaf -
value
- either the value such that observations less than or equal to it are sent down the left path of the tree or the predicted value for a leaf node
The order of nodes in the result corresponds to a depth-first traversal, going down the left-side first. The names of variables used in splitting can be recovered by examining the column names of the fit$data@x
element of a fitted bart
or bart2
model. See the package vignette “Working with dbarts Saved Trees”.
Value
bart
and bart2
return lists assigned the class bart
. For applicable quantities, ndpost / keepevery
samples are returned. In the numeric y
case, the list has components:
yhat.train |
A array/matrix of posterior samples. The |
yhat.test |
Same as |
yhat.train.mean |
Vector of means of |
yhat.test.mean |
Vector of means of |
sigma |
Matrix of posterior samples of |
first.sigma |
Burn-in draws of |
varcount |
A matrix with number of rows equal to the number of kept draws and each column corresponding to a training variable. Contains the total count of the number of times that variable is used in a tree decision rule (over all trees). |
sigest |
The rough error standard deviation ( |
y |
The input dependent vector of values for the dependent variable. This is used in |
fit |
Optional sampler object which stores the values of the tree splits. Required for using |
n.chains |
Information that can be lost if |
k |
Optional matrix of posterior samples of |
first.k |
Burn-in draws of |
In the binary y
case, the returned list has the components yhat.train
, yhat.test
, and varcount
as above. In addition the list has a binaryOffset
component giving the value used.
Note that in the binary y
, case yhat.train
and yhat.test
are f(x) + \mathrm{binaryOffset}
. For draws of the probability P(Y = 1 | x)
, apply the normal cdf (pnorm
) to these values.
The plot
method sets mfrow
to c(1, 2)
and makes two plots. The first plot is the sequence of kept draws of \sigma
including the burn-in draws. Initially these draws will decline as BART finds a good fit and then level off when the MCMC has burnt in. The second plot has y
on the horizontal axis and posterior intervals for the corresponding f(x)
on the vertical axis.
Author(s)
Hugh Chipman: hugh.chipman@gmail.com, Robert McCulloch: robert.mcculloch1@gmail.com, Vincent Dorie: vdorie@gmail.com.
References
Chipman, H., George, E., and McCulloch, R. (2009) BART: Bayesian Additive Regression Trees.
Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning. Advances in Neural Information Processing Systems 19, Scholkopf, Platt and Hoffman, Eds., MIT Press, Cambridge, MA, 265-272.
both of the above at: https://www.rob-mcculloch.org
Friedman, J.H. (1991) Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–67.
See Also
Examples
## simulate data (example from Friedman MARS paper)
## y = f(x) + epsilon , epsilon ~ N(0, sigma)
## x consists of 10 variables, only first 5 matter
f <- function(x) {
10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 +
10 * x[,4] + 5 * x[,5]
}
set.seed(99)
sigma <- 1.0
n <- 100
x <- matrix(runif(n * 10), n, 10)
Ey <- f(x)
y <- rnorm(n, Ey, sigma)
## run BART
set.seed(99)
bartFit <- bart(x, y)
plot(bartFit)
## compare BART fit to linear matter and truth = Ey
lmFit <- lm(y ~ ., data.frame(x, y))
fitmat <- cbind(y, Ey, lmFit$fitted, bartFit$yhat.train.mean)
colnames(fitmat) <- c('y', 'Ey', 'lm', 'bart')
print(cor(fitmat))