pdbart {dbarts} | R Documentation |
Partial Dependence Plots for BART
Description
Run bart
at test observations constructed so that a plot can be created displaying the effect of a single variable (pdbart
) or pair of variables (pd2bart
). Note that if is a binary with
,
the standard normal cdf, then the plots are all on the
scale.
Usage
pdbart(
x.train, y.train,
xind = NULL,
levs = NULL, levquants = c(0.05, seq(0.1, 0.9, 0.1), 0.95),
pl = TRUE, plquants = c(0.05, 0.95),
...)
## S3 method for class 'pdbart'
plot(
x,
xind = seq_len(length(x$fd)),
plquants = c(0.05, 0.95), cols = c('black', 'blue'),
...)
pd2bart(
x.train, y.train,
xind = NULL,
levs = NULL, levquants = c(0.05, seq(0.1, 0.9, 0.1), 0.95),
pl = TRUE, plquants = c(0.05, 0.95),
...)
## S3 method for class 'pd2bart'
plot(
x,
plquants = c(0.05, 0.95), contour.color = 'white',
justmedian = TRUE,
...)
Arguments
x.train |
Explanatory variables for training (in sample) data. Can be any valid input to |
y.train |
Dependent variable for training (in sample) data. Can be a numeric vector or, when passing |
xind |
Integer, character vector, or the right-hand side of a formula indicating which variables are to be plotted. In |
levs |
Gives the values of a variable at which the plot is to be constructed. Must be a list, where the |
levquants |
If |
pl |
For |
plquants |
In the plots, beliefs about |
... |
Additional arguments. In |
x |
For |
cols |
Vector of two colors. The first color is for the median of |
contour.color |
Color for contours plotted on top of the image. |
justmedian |
A logical where if |
Details
We divide the predictor vector into a subgroup of interest,
and the complement
. A prediction
can then be written as
. To estimate the effect of
on the prediction, Friedman suggests the partial dependence function
where is the
th observation of
in the data. Note that
will generally not be one of the observed data points. Using BART it is straightforward to then estimate and even obtain uncertainty bounds for
. A draw of
from the induced BART posterior on
is obtained by simply computing
as a byproduct of each MCMC draw
. The median (or average) of these MCMC draws
then yields an estimate of
, and lower and upper quantiles can be used to obtain intervals for
.
In pdbart
consists of a single variable in
and in
pd2bart
it is a pair of variables.
This is a computationally intensive procedure. For example, in pdbart
, to compute the partial dependence plot for 5 values, we need to compute
for all possible
and there would be
of these where
is the sample size. All of that computation would be done for each kept BART draw. For this reason running BART with
keepevery
larger than 1 (eg. 10) makes the procedure much faster.
Value
The plot methods produce the plots and don't return anything.
pdbart
and pd2bart
return lists with components given below. The list returned by pdbart
is assigned class pdbart
and the list returned by pd2bart
is assigned class pd2bart
.
fd |
A matrix whose For For |
levs |
The list of levels used, each component corresponding to a variable. If argument |
xlbs |
A vector of character strings which are the plotting labels used for the variables. |
The remaining components returned in the list are the same as in the value of bart
. They are simply passed on from the BART run used to create the partial dependence plot. The function plot.bart
can be applied to the object returned by pdbart
or pd2bart
to examine the BART run.
Author(s)
Hugh Chipman: hugh.chipman@acadiau.ca.
Robert McCulloch: robert.mcculloch@chicagogsb.edu.
References
Chipman, H., George, E., and McCulloch, R. (2006) BART: Bayesian Additive Regression Trees.
Chipman, H., George, E., and McCulloch R. (2006) Bayesian Ensemble Learning.
both of the above at: https://www.rob-mcculloch.org/
Friedman, J.H. (2001) Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232.
Examples
## Not run:
## simulate data
f <- function(x)
return(0.5 * x[,1] + 2 * x[,2] * x[,3])
sigma <- 0.2
n <- 100
set.seed(27)
x <- matrix(2 * runif(n * 3) - 1, ncol = 3)
colnames(x) <- c('rob', 'hugh', 'ed')
Ey <- f(x)
y <- rnorm(n, Ey, sigma)
## first two plot regions are for pdbart, third for pd2bart
par(mfrow = c(1, 3))
## pdbart: one dimensional partial dependence plot
set.seed(99)
pdb1 <- pdbart(
x, y, xind = c(1, 2),
levs = list(seq(-1, 1, 0.2), seq(-1, 1, 0.2)),
pl = FALSE, keepevery = 10, ntree = 100
)
plot(pdb1, ylim = c(-0.6, 0.6))
## pd2bart: two dimensional partial dependence plot
set.seed(99)
pdb2 <- pd2bart(
x, y, xind = c(2, 3),
levquants = c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95),
pl = FALSE, ntree = 100, keepevery = 10, verbose = FALSE)
plot(pdb2)
## compare BART fit to linear model and truth = Ey
lmFit <- lm(y ~ ., data.frame(x, y))
fitmat <- cbind(y, Ey, lmFit$fitted, pdb1$yhat.train.mean)
colnames(fitmat) <- c('y', 'Ey', 'lm', 'bart')
print(cor(fitmat))
## example showing the use of a pre-fitted model
df <- data.frame(y, x)
set.seed(99)
bartFit <- bart(
y ~ rob + hugh + ed, df,
keepevery = 10, ntree = 100, keeptrees = TRUE)
pdb1 <- pdbart(bartFit, xind = rob + ed, pl = FALSE)
## End(Not run)