softbart {SoftBart} | R Documentation |
Fits the SoftBart model
Description
Runs the Markov chain for the semiparametric Gaussian model
Y = r(X) +
\epsilon
and collects the output, where r(x)
is modeled using a soft BART model.
Usage
softbart(X, Y, X_test, hypers = NULL, opts = Opts(), verbose = TRUE)
Arguments
X |
A matrix of training data covariates. |
Y |
A vector of training data responses. |
X_test |
A matrix of test data covariates |
hypers |
A ;ist of hyperparameter values obtained from |
opts |
A list of MCMC chain settings obtained from |
verbose |
If |
Value
Returns a list with the following components:
-
y_hat_train
: predicted values for the training data for each iteration of the chain. -
y_hat_test
: predicted values for the test data for each iteration of the chain. -
y_hat_train_mean
: predicted values for the training data, averaged over iterations. -
y_hat_test_mean
: predicted values for the test data, averaged over iterations. -
sigma
: posterior samples of the error standard deviations. -
sigma_mu
: posterior samples ofsigma_mu
, the standard deviation of the leaf node parameters. -
s
: posterior samples ofs
. -
alpha
: posterior samples ofalpha
. -
beta
: posterior samples ofbeta
. -
gamma
: posterior samples ofgamma
. -
k
: posterior samples ofk = 0.5 / (sqrt(num_tree) * sigma_mu)
-
num_leaves_final
: the number of leaves for each tree at the final iteration.
Examples
## NOTE: SET NUMBER OF BURN IN AND SAMPLE ITERATIONS HIGHER IN PRACTICE
num_burn <- 10 ## Should be ~ 5000
num_save <- 10 ## Should be ~ 5000
set.seed(1234)
f_fried <- function(x) 10 * sin(pi * x[,1] * x[,2]) + 20 * (x[,3] - 0.5)^2 +
10 * x[,4] + 5 * x[,5]
gen_data <- function(n_train, n_test, P, sigma) {
X <- matrix(runif(n_train * P), nrow = n_train)
mu <- f_fried(X)
X_test <- matrix(runif(n_test * P), nrow = n_test)
mu_test <- f_fried(X_test)
Y <- mu + sigma * rnorm(n_train)
Y_test <- mu_test + sigma * rnorm(n_test)
return(list(X = X, Y = Y, mu = mu, X_test = X_test, Y_test = Y_test, mu_test = mu_test))
}
## Simiulate dataset
sim_data <- gen_data(250, 100, 1000, 1)
## Fit the model
fit <- softbart(X = sim_data$X, Y = sim_data$Y, X_test = sim_data$X_test,
hypers = Hypers(sim_data$X, sim_data$Y, num_tree = 50, temperature = 1),
opts = Opts(num_burn = num_burn, num_save = num_save, update_tau = TRUE))
## Plot the fit (note: interval estimates are not prediction intervals,
## so they do not cover the predictions at the nominal rate)
plot(fit)
## Look at posterior model inclusion probabilities for each predictor.
plot(posterior_probs(fit)[["post_probs"]],
col = ifelse(posterior_probs(fit)[["post_probs"]] > 0.5, scales::muted("blue"),
scales::muted("green")),
pch = 20)
rmse(fit$y_hat_test_mean, sim_data$mu_test)
rmse(fit$y_hat_train_mean, sim_data$mu)