mbart {BART} | R Documentation |
Multinomial BART for categorical outcomes with fewer categories
Description
BART is a Bayesian “sum-of-trees” model.
For numeric response y
, we have
y = f(x) +\epsilon
,
where \epsilon \sim N(0, 1)
.
For a multinomial response y
, P(Y=y | x) = F(f(x))
,
where F
denotes the standard Normal CDF (probit link) or the
standard Logistic CDF (logit link).
In both cases, f
is the sum of many tree models.
The goal is to have very flexible inference for the uknown
function f
.
In the spirit of “ensemble models”, each tree is constrained by a prior to be a weak learner so that it contributes a small amount to the overall fit.
Usage
mbart(
x.train, y.train,
x.test=matrix(0,0,0), type='pbart',
ntype=as.integer(
factor(type,
levels=c('wbart', 'pbart', 'lbart'))),
sparse=FALSE, theta=0, omega=1,
a=0.5, b=1, augment=FALSE, rho=NULL,
xinfo=matrix(0,0,0), usequants=FALSE,
rm.const=TRUE,
k=2, power=2, base=0.95,
tau.num=c(NA, 3, 6)[ntype],
offset=NULL,
ntree=c(200L, 50L, 50L)[ntype], numcut=100L,
ndpost=1000L, nskip=100L,
keepevery=c(1L, 10L, 10L)[ntype],
printevery=100L, transposed=FALSE,
hostname=FALSE,
mc.cores = 2L, ## mc.bart only
nice = 19L, ## mc.bart only
seed = 99L ## mc.bart only
)
mc.mbart(
x.train, y.train,
x.test=matrix(0,0,0), type='pbart',
ntype=as.integer(
factor(type,
levels=c('wbart', 'pbart', 'lbart'))),
sparse=FALSE, theta=0, omega=1,
a=0.5, b=1, augment=FALSE, rho=NULL,
xinfo=matrix(0,0,0), usequants=FALSE,
rm.const=TRUE,
k=2, power=2, base=0.95,
tau.num=c(NA, 3, 6)[ntype],
offset=NULL,
ntree=c(200L, 50L, 50L)[ntype], numcut=100L,
ndpost=1000L, nskip=100L,
keepevery=c(1L, 10L, 10L)[ntype],
printevery=100L, transposed=FALSE,
hostname=FALSE,
mc.cores = 2L, ## mc.bart only
nice = 19L, ## mc.bart only
seed = 99L ## mc.bart only
)
Arguments
x.train |
Explanatory variables for training (in sample) data. |
y.train |
Categorical dependent variable for training (in sample) data. |
x.test |
Explanatory variables for test (out of sample) data. |
type |
You can use this argument to specify the type of fit.
|
ntype |
The integer equivalent of |
sparse |
Whether to perform variable selection based on a sparse Dirichlet prior rather than simply uniform; see Linero 2016. |
theta |
Set |
omega |
Set |
a |
Sparse parameter for |
b |
Sparse parameter for |
rho |
Sparse parameter: typically |
augment |
Whether data augmentation is to be performed in sparse variable selection. |
xinfo |
You can provide the cutpoints to BART or let BART
choose them for you. To provide them, use the |
usequants |
If |
rm.const |
Whether or not to remove constant variables. |
k |
For categorical |
power |
Power parameter for tree prior. |
base |
Base parameter for tree prior. |
tau.num |
The numerator in the |
offset |
With Multinomial
BART, the centering is |
ntree |
The number of trees in the sum. |
numcut |
The number of possible values of c (see usequants).
If a single number if given, this is used for all variables.
Otherwise a vector with length equal to ncol(x.train) is required,
where the |
ndpost |
The number of posterior draws returned. |
nskip |
Number of MCMC iterations to be treated as burn in. |
keepevery |
Every keepevery draw is kept to be returned to the user. |
printevery |
As the MCMC runs, a message is printed every printevery draws. |
transposed |
When running |
hostname |
When running on a cluster occasionally it is useful
to track on which node each chain is running; to do so
set this argument to |
seed |
Setting the seed required for reproducible MCMC. |
mc.cores |
Number of cores to employ in parallel. |
nice |
Set the job niceness. The default niceness is 19: niceness goes from 0 (highest) to 19 (lowest). |
Details
BART is an Bayesian MCMC method.
At each MCMC interation, we produce a draw from
f
in the categorical y
case.
Thus, unlike a lot of other modelling methods in R, we do not produce
a single model object from which fits and summaries may be extracted.
The output consists of values f^*(x)
where * denotes a particular draw.
The x
is either a row from the training data (x.train).
Value
mbart
returns an object of type mbart
which is
essentially a list.
yhat.train |
A matrix with |
yhat.train.mean |
train data fits = mean of |
varcount |
a matrix with |
In addition, the list
has a offset
vector giving the value used.
Note that in the multinomial y
case yhat.train
is
f(x) + offset[j]
.
See Also
Examples
N=500
set.seed(12)
x1=runif(N)
x2=runif(N, max=1-x1)
x3=1-x1-x2
x.train=cbind(x1, x2, x3)
y.train=0
for(i in 1:N)
y.train[i]=sum((1:3)*rmultinom(1, 1, x.train[i, ]))
table(y.train)/N
##test mbart with token run to ensure installation works
set.seed(99)
post = mbart(x.train, y.train, nskip=1, ndpost=1)
## Not run:
set.seed(99)
post=mbart(x.train, y.train, x.train)
##mc.post=mbart(x.train, y.train, x.test, mc.cores=8, seed=99)
K=3
i=seq(1, N*K, K)-1
for(j in 1:K)
print(cor(x.train[ , j], post$prob.test.mean[i+j])^2)
## End(Not run)