bartBMA {bartBMA} R Documentation

## Bayesian Additive Regression Trees Using Bayesian Model Averaging (BART-BMA)

### Description

This is an implementation of Bayesian Additive Regression Trees (Chipman et al. 2010) using Bayesian Model Averaging (Hernandez et al. 2018).

### Usage

bartBMA(x.train, ...)

## Default S3 method:
bartBMA(
x.train,
y.train,
a = 3,
nu = 3,
sigquant = 0.9,
c = 1000,
pen = 12,
num_cp = 20,
x.test = matrix(0, 0, 0),
num_rounds = 5,
alpha = 0.95,
beta = 2,
split_rule_node = 0,
gridpoint = 0,
maxOWsize = 100,
num_splits = 5,
gridsize = 10,
zero_split = 1,
only_max_num_trees = 1,
min_num_obs_for_split = 2,
min_num_obs_after_split = 2,
exact_residuals = 1,
spike_tree = 0,
s_t_hyperprior = 1,
p_s_t = 0.5,
a_s_t = 1,
b_s_t = 3,
lambda_poisson = 10,
less_greedy = 0,
...
)


### Arguments

 x.train Training data covariate matrix ... Further arguments. y.train Training data outcome vector. a This is a parameter that influences the variance of terminal node parameter values. Default value a=3. nu This is a hyperparameter in the distribution of the variance of the error term. THe inverse of the variance is distributed as Gamma (nu/2, nu*lambda/2). Default value nu=3. sigquant Calibration quantile for the inverse chi-squared prior on the variance of the error term. c This determines the size of Occam's Window pen This is a parameter used by the Pruned Exact Linear Time Algorithm when finding changepoints. Default value pen=12. num_cp This is a number between 0 and 100 that determines the proportion of changepoints proposed by the changepoint detection algorithm to keep when growing trees. Default num_cp=20. x.test Test data covariate matrix. Default x.test=matrix(0.0,0,0). num_rounds Number of trees. (Maximum number of trees in a sum-of-tree model). Default num_rounds=5. alpha Parameter in prior probability of tree node splitting. Default alpha=0.95 beta Parameter in prior probability of tree node splitting. Default beta=1 split_rule_node Binary variable. If equals 1, then find a new set of potential splitting points via a changepoint algorithm after adding each split to a tree. If equals zero, use the same set of potential split points for all splits in a tree. Default split_rule_node=0. gridpoint Binary variable. If equals 1, then a grid search changepoint detection algorithm will be used. If equals 0, then the Pruned Exact Linear Time (PELT) changepoint detection algorithm will be used (Killick et al. 2012). Default gridpoint=0. maxOWsize Maximum number of models to keep in Occam's window. Default maxOWsize=100. num_splits Maximum number of splits in a tree gridsize This integer determines the size of the grid across which to search if gridpoint=1 when finding changepoints for constructing trees. zero_split Binary variable. If equals 1, then zero split trees can be included in a sum-of-trees model. If equals zero, then only trees with at least one split can be included in a sum-of-trees model. only_max_num_trees Binary variable. If equals 1, then only sum-of-trees models containing the maximum number of trees, num_rounds, are selected. If equals 0, then sum-of-trees models containing less than num_rounds trees can be selected. The default is only_max_num_trees=1. min_num_obs_for_split This integer determines the minimum number of observations in a (parent) tree node for the algorithm to consider potential splits of the node. min_num_obs_after_split This integer determines the minimum number of observations in a child node resulting from a split in order for a split to occur. If the left or right chikd node has less than this number of observations, then the split can not occur. exact_residuals Binary variable. If equal to 1, then trees are added to sum-of-tree models within each round of the algorithm by detecting changepoints in the exact residuals. If equals zero, then changepoints are detected in residuals that are constructed from approximate predictions. spike_tree If equal to 1, then the Spike-and-Tree prior will be used, otherwise the standard BART prior will be used. The number of splitting variables has a beta-binomial prior. The number of terminal nodes has a truncated Poisson prior, and then a uniform prior is placed on the set of valid constructions of trees given the splitting variables and number of terminal nodes. s_t_hyperprior If equals 1 and spike_tree equals 1, then a beta distribution hyperprior is placed on the variable inclusion probabilities for the spike and tree prior. The hyperprior parameters are a_s_t and b_s_t. p_s_t If spike_tree=1 and s_t_hyperprior=0, then p_s_t is the prior variable inclusion probability. a_s_t If spike_tree=1 and s_t_hyperprior=1, then a_s_t is a parameter of a beta distribution hyperprior. b_s_t If spike_tree=1 and s_t_hyperprior=1, then b_s_t is a parameter of a beta distribution hyperprior. lambda_poisson This is a parameter for the Spike-and-Tree prior. It is the parameter for the (truncated and conditional on the number of splitting variables) Poisson prior on the number of terminal nodes. less_greedy If equal to one, then a less greedy model search algorithm is used.

### Value

The following objects are returned by bartbma:

 fitted.values The vector of predictions of the outcome for all training observations. sumoftrees This is a list of lists of matrices. The outer list corresponds to a list of sum-of-tree models, and each element of the outer list is a list of matrices describing the structure of the trees within a sum-of-tree model. See details. obs_to_termNodesMatrix This is a list of lists of matrices. The outer list corresponds to a list of sum-of-tree models, and each element of the outer list is a list of matrices describing to which node each of the observations is allocated to at all depths of each tree within a sum-of-tree model. See details. bic This is a vector of BICs for each sum-of-tree model. test.preds A vector of test data predictions. This output only is given if there is test data in the input. sum_residuals CURRENTLY INCORRECT OUTPUT. A List (over sum-of-tree models) of lists (over single trees in a model) of vectors of partial residuals. Unless the maximum number of trees in a model is one, in which case the output is a list (over single tree models) of vectors of partial residuals, which are all equal to the outcome vector. numvars This is the total number of variables in the input training data matrix. call match.call returns a call in which all of the specified arguments are specified by their full names. y_minmax Range of the input training data outcome vector. response Input taining data outcome vector. nrowTrain number of observations in the input training data. sigma sd(y.train)/(max(y.train)-min(y.train)) a input parameter nu input parameter lambda parameter determined by the inputs sigma, sigquant, and nu

### References

Chipman HA, George EI, McCulloch RE, others (2010). “BART: Bayesian additive regression trees.” The Annals of Applied Statistics, 4(1), 266–298.

Hernandez B, Raftery AE, Pennington SR, Parnell AC (2018). “Bayesian additive regression trees using Bayesian model averaging.” Statistics and Computing, 28(4), 869–890.

### Examples

N <- 100
p<- 100
set.seed(100)
library(bartBMA)
epsilon <- rnorm(N)
xcov <- matrix(runif(N*p), nrow=N)
y <- sin(pi*xcov[,1]*xcov[,2]) + 20*(xcov[,3]-0.5)^2+10*xcov[,4]+
5*xcov[,5]+epsilon
epsilontest <- rnorm(N)
xcovtest <- matrix(runif(N*p), nrow=N)
ytest <- sin(pi*xcovtest[,1]*xcovtest[,2]) + 20*(xcovtest[,3]-0.5)^2+10*xcovtest[,4]+
5*xcovtest[,5]+epsilontest
bart_bma_example <- bartBMA(x.train = xcov,y.train=y,x.test=xcovtest,zero_split = 1,
only_max_num_trees = 1,split_rule_node = 0)


[Package bartBMA version 1.0 Index]