mc.lbart {BART} | R Documentation |

BART is a Bayesian “sum-of-trees” model.

For numeric response *y*, we have
*y = f(x) + e*,
where *e ~ Log(0, 1)*.

For a binary response *y*, *P(Y=1 | x) = F(f(x))*, where *F*
denotes the standard Logistic CDF (logit link).

In both cases, *f* is the sum of many tree models.
The goal is to have very flexible inference for the uknown
function *f*.

In the spirit of “ensemble models”, each tree is constrained by a prior to be a weak learner so that it contributes a small amount to the overall fit.

mc.lbart( x.train, y.train, x.test=matrix(0.0,0,0), sparse=FALSE, a=0.5, b=1, augment=FALSE, rho=NULL, xinfo=matrix(0.0,0,0), usequants=FALSE, cont=FALSE, rm.const=TRUE, tau.interval=0.95, k=2.0, power=2.0, base=.95, binaryOffset=NULL, ntree=50L, numcut=100L, ndpost=1000L, nskip=100L, keepevery=1L, printevery=100, keeptrainfits=TRUE, transposed=FALSE, mc.cores = 2L, nice = 19L, seed = 99L )

`x.train` |
Explanatory variables for training (in sample) data. |

`y.train` |
Dependent variable for training (in sample) data. |

`x.test` |
Explanatory variables for test (out of sample) data. |

`sparse` |
Whether to perform variable selection based on a sparse Dirichlet prior rather than simply uniform; see Linero 2016. |

`a` |
Sparse parameter for |

`b` |
Sparse parameter for |

`rho` |
Sparse parameter: typically |

`augment` |
Whether data augmentation is to be performed in sparse variable selection. |

`xinfo` |
You can provide the cutpoints to BART or let BART
choose them for you. To provide them, use the |

`usequants` |
If |

`cont` |
Whether or not to assume all variables are continuous. |

`rm.const` |
Whether or not to remove constant variables. |

`tau.interval` |
The width of the interval to scale the variance for the terminal leaf values. |

`k` |
For numeric y,
k is the number of prior standard deviations |

`power` |
Power parameter for tree prior. |

`base` |
Base parameter for tree prior. |

`binaryOffset` |
Used for binary |

`ntree` |
The number of trees in the sum. |

`numcut` |
The number of possible values of c (see usequants).
If a single number if given, this is used for all variables.
Otherwise a vector with length equal to ncol(x.train) is required,
where the |

`ndpost` |
The number of posterior draws returned. |

`nskip` |
Number of MCMC iterations to be treated as burn in. |

`keepevery` |
Every keepevery draw is kept to be returned to the user. |

`printevery` |
As the MCMC runs, a message is printed every printevery draws. |

`keeptrainfits` |
Whether to keep |

`transposed` |
When running |

`seed` |
Setting the seed required for reproducible MCMC. |

`mc.cores` |
Number of cores to employ in parallel. |

`nice` |
Set the job niceness. The default niceness is 19: niceness goes from 0 (highest) to 19 (lowest). |

BART is an Bayesian MCMC method.
At each MCMC interation, we produce a draw from the joint posterior
*(f,sigma) \| (x,y)* in the numeric *y* case
and just *f* in the binary *y* case.

Thus, unlike a lot of other modelling methods in R, we do not produce a single model object
from which fits and summaries may be extracted. The output consists of values
*f*(x)* (and *sigma** in the numeric case) where * denotes a particular draw.
The *x* is either a row from the training data (x.train) or the test data (x.test).

`mc.lbart`

returns an object of type `lbart`

which is
essentially a list.

`yhat.train` |
A matrix with ndpost rows and nrow(x.train) columns.
Each row corresponds to a draw |

`yhat.test` |
Same as yhat.train but now the x's are the rows of the test data. |

`yhat.train.mean` |
train data fits = mean of yhat.train columns. |

`yhat.test.mean` |
test data fits = mean of yhat.test columns. |

`varcount` |
a matrix with ndpost rows and nrow(x.train) columns. Each row is for a draw. For each variable (corresponding to the columns), the total count of the number of times that variable is used in a tree decision rule (over all trees) is given. |

In addition, the list
has a `binaryOffset`

giving the value used.

Note that in the binary *y*, case yhat.train and yhat.test are
*f(x) + binaryOffset*. If you want draws of the probability
*P(Y=1 | x)* you need to apply the Logistic cdf (`plogis`

)
to these values.

set.seed(99) n=5000 x = sort(-2+4*runif(n)) X=matrix(x,ncol=1) f = function(x) {return((1/2)*x^3)} FL = function(x) {return(exp(x)/(1+exp(x)))} pv = FL(f(x)) y = rbinom(n,1,pv) np=100 xp=-2+4*(1:np)/np Xp=matrix(xp,ncol=1) ## parallel::mcparallel/mccollect do not exist on windows ## if(.Platform$OS.type=='unix') { ## ##test BART with token run to ensure installation works ## mf = mc.lbart(X, y, nskip=5, ndpost=5, mc.cores=1, seed=99) ## } ## Not run: set.seed(99) pf = lbart(X,y,Xp) plot(f(Xp), pf$yhat.test.mean, xlim=c(-4, 4), ylim=c(-4, 4), xlab='True f(x)', ylab='BART f(x)') lines(c(-4, 4), c(-4, 4)) mf = mc.lbart(X,y,Xp, mc.cores=4, seed=99) plot(f(Xp), mf$yhat.test.mean, xlim=c(-4, 4), ylim=c(-4, 4), xlab='True f(x)', ylab='BART f(x)') lines(c(-4, 4), c(-4, 4)) par(mfrow=c(2,2)) plot(range(xp),range(pf$yhat.test),xlab='x',ylab='f(x)',type='n') lines(x,f(x),col='blue',lwd=2) lines(xp,apply(pf$yhat.test,2,mean),col='red') qpl = apply(pf$yhat.test,2,quantile,probs=c(.025,.975)) lines(xp,qpl[1,],col='green',lty=1) lines(xp,qpl[2,],col='green',lty=1) title(main='BART::lbart f(x) with 0.95 intervals') plot(range(xp),range(mf$yhat.test),xlab='x',ylab='f(x)',type='n') lines(x,f(x),col='blue',lwd=2) lines(xp,apply(mf$yhat.test,2,mean),col='red') qpl = apply(mf$yhat.test,2,quantile,probs=c(.025,.975)) lines(xp,qpl[1,],col='green',lty=1) lines(xp,qpl[2,],col='green',lty=1) title(main='BART::mc.lbart f(x) with 0.95 intervals') plot(pf$yhat.test.mean,apply(mf$yhat.test,2,mean),xlab='BART::lbart',ylab='BART::mc.lbart') abline(0,1,col='red') title(main="BART::lbart f(x) vs. BART::mc.lbart f(x)") ## End(Not run)

[Package *BART* version 2.9 Index]