probit_bartBMA {bartBMA} | R Documentation |
This is an implementation of Bayesian Additive Regression Trees (Chipman et al. 2018) using Bayesian Model Averaging (Hernandez et al. 2018).
probit_bartBMA(x.train, ...)
## Default S3 method:
probit_bartBMA(
x.train,
y.train,
a = 3,
nu = 3,
sigquant = 0.9,
c = 1000,
pen = 12,
num_cp = 20,
x.test = matrix(0, 0, 0),
num_rounds = 5,
alpha = 0.95,
beta = 2,
split_rule_node = 0,
gridpoint = 0,
maxOWsize = 100,
num_splits = 5,
gridsize = 10,
zero_split = 1,
only_max_num_trees = 1,
min_num_obs_for_split = 2,
min_num_obs_after_split = 2,
exact_residuals = 1,
spike_tree = 0,
s_t_hyperprior = 1,
p_s_t = 0.5,
a_s_t = 1,
b_s_t = 3,
lambda_poisson = 10,
less_greedy = 0,
...
)
x.train |
Training data covariate matrix |
... |
Further arguments. |
y.train |
Training data outcome vector. |
a |
This is a parameter that influences the variance of terminal node parameter values. Default value a=3. |
nu |
This is a hyperparameter in the distribution of the variance of the error term. THe inverse of the variance is distributed as Gamma (nu/2, nu*lambda/2). Default value nu=3. |
sigquant |
Calibration quantile for the inverse chi-squared prior on the variance of the error term. |
c |
This determines the size of Occam's Window |
pen |
This is a parameter used by the Pruned Exact Linear Time Algorithm when finding changepoints. Default value pen=12. |
num_cp |
This is a number between 0 and 100 that determines the proportion of changepoints proposed by the changepoint detection algorithm to keep when growing trees. Default num_cp=20. |
x.test |
Test data covariate matrix. Default x.test=matrix(0.0,0,0). |
num_rounds |
Number of trees. (Maximum number of trees in a sum-of-tree model). Default num_rounds=5. |
alpha |
Parameter in prior probability of tree node splitting. Default alpha=0.95 |
beta |
Parameter in prior probability of tree node splitting. Default beta=1 |
split_rule_node |
Binary variable. If equals 1, then find a new set of potential splitting points via a changepoint algorithm after adding each split to a tree. If equals zero, use the same set of potential split points for all splits in a tree. Default split_rule_node=0. |
gridpoint |
Binary variable. If equals 1, then a grid search changepoint detection algorithm will be used. If equals 0, then the Pruned Exact Linear Time (PELT) changepoint detection algorithm will be used (Killick et al. 2012). Default gridpoint=0. |
maxOWsize |
Maximum number of models to keep in Occam's window. Default maxOWsize=100. |
num_splits |
Maximum number of splits in a tree |
gridsize |
This integer determines the size of the grid across which to search if gridpoint=1 when finding changepoints for constructing trees. |
zero_split |
Binary variable. If equals 1, then zero split trees can be included in a sum-of-trees model. If equals zero, then only trees with at least one split can be included in a sum-of-trees model. |
only_max_num_trees |
Binary variable. If equals 1, then only sum-of-trees models containing the maximum number of trees, num_rounds, are selected. If equals 0, then sum-of-trees models containing less than num_rounds trees can be selected. The default is only_max_num_trees=1. |
min_num_obs_for_split |
This integer determines the minimum number of observations in a (parent) tree node for the algorithm to consider potential splits of the node. |
min_num_obs_after_split |
This integer determines the minimum number of observations in a child node resulting from a split in order for a split to occur. If the left or right chikd node has less than this number of observations, then the split can not occur. |
exact_residuals |
Binary variable. If equal to 1, then trees are added to sum-of-tree models within each round of the algorithm by detecting changepoints in the exact residuals. If equals zero, then changepoints are detected in residuals that are constructed from approximate predictions. |
spike_tree |
If equal to 1, then the Spike-and-Tree prior will be used, otherwise the standard BART prior will be used. The number of splitting variables has a beta-binomial prior. The number of terminal nodes has a truncated Poisson prior, and then a uniform prior is placed on the set of valid constructions of trees given the splitting variables and number of terminal nodes. |
s_t_hyperprior |
If equals 1 and spike_tree equals 1, then a beta distribution hyperprior is placed on the variable inclusion probabilities for the spike and tree prior. The hyperprior parameters are a_s_t and b_s_t. |
p_s_t |
If spike_tree=1 and s_t_hyperprior=0, then p_s_t is the prior variable inclusion probability. |
a_s_t |
If spike_tree=1 and s_t_hyperprior=1, then a_s_t is a parameter of a beta distribution hyperprior |
b_s_t |
If spike_tree=1 and s_t_hyperprior=1, then b_s_t is a parameter of a beta distribution hyperprior |
lambda_poisson |
This is a parameter for the Spike-and-Tree prior. It is the parameter for the (truncated and conditional on the number of splitting variables) Poisson prior on the number of terminal nodes. |
less_greedy |
If equal to one, then a less greedy model search algorithm is used. |
The following objects are returned by bartbma:
fitted.values |
The vector of predictions of the outcome for all training observations. |
sumoftrees |
This is a list of lists of matrices. The outer list corresponds to a list of sum-of-tree models, and each element of the outer list is a list of matrices describing the structure of the trees within a sum-of-tree model. See details. |
obs_to_termNodesMatrix |
This is a list of lists of matrices. The outer list corresponds to a list of sum-of-tree models, and each element of the outer list is a list of matrices describing to which node each of the observations is allocated to at all depths of each tree within a sum-of-tree model. See details. |
bic |
This is a vector of BICs for each sum-of-tree model. |
test.preds |
A vector of test data predictions. This output only is given if there is test data in the input. |
sum_residuals |
CURRENTLY INCORRECT OUTPUT. A List (over sum-of-tree models) of lists (over single trees in a model) of vectors of partial residuals. Unless the maximum number of trees in a model is one, in which case the output is a list (over single tree models) of vectors of partial residuals, which are all equal to the outcome vector. |
numvars |
This is the total number of variables in the input training data matrix. |
call |
match.call returns a call in which all of the specified arguments are specified by their full names. |
y_minmax |
Range of the input training data outcome vector. |
response |
Input taining data outcome vector. |
nrowTrain |
number of observations in the input training data. |
sigma |
sd(y.train)/(max(y.train)-min(y.train)) |
a |
input parameter |
nu |
input parameter |
lambda |
parameter determined by the inputs sigma, sigquant, and nu |
fitted.probs |
In-sample fitted probabilities |
fitted.classes |
In-sample fitted classes |
#Example from BART package (McCulloch et al. 2019)
set.seed(99)
n=100
x = sort(-2+4*runif(n))
X=matrix(x,ncol=1)
f = function(x) {return((1/2)*x^3)}
FL = function(x) {return(exp(x)/(1+exp(x)))}
pv = FL(f(x))
y = rbinom(n,1,pv)
probit_bartBMA(x.train = X,y.train = y)