| sbc {rstan} | R Documentation |
Simulation Based Calibration (sbc)
Description
Check whether a model is well-calibrated with respect to the
prior distribution and hence possibly amenable to obtaining
a posterior distribution conditional on observed data.
Usage
sbc(stanmodel, data, M, ..., save_progress, load_incomplete=FALSE)
## S3 method for class 'sbc'
plot(x, thin = 3, ...)
## S3 method for class 'sbc'
print(x, ...)
Arguments
stanmodel |
An object of |
data |
A named |
M |
The number of times to condition on draws from the prior predictive distribution |
... |
Additional arguments that are passed to |
x |
An object produced by |
thin |
An integer vector of length one indicating the thinning interval when plotting, which defaults to 3 |
save_progress |
If a directory is provided, stanfit objects
are saved to disk making it easy to resume a partial |
load_incomplete |
When |
Details
This function assumes adherence to the following conventions in the underlying Stan program:
Realizations of the unknown parameters are drawn in the
transformed datablock of the Stan program and are postfixed with an underscore, such astheta_. These are considered the “true” parameters being estimated by the corresponding symbol declared in theparametersblock, which should have the same name except for the trailing underscore, such astheta.The realizations of the unknown parameters are then conditioned on when drawing from the prior predictive distribution, also in the
transformed datablock. There is no restriction on the symbol name that holds the realizations from the prior predictive distribution but for clarity, it should not end with a trailing underscore.The realizations of the unknown parameters should be copied into a
vectorin thegenerated quantitiesblock namedpars_.The realizations from the prior predictive distribution should be copied into an object (of the same type) in the
generated quantitiesblock namedy_. Technically, this step is optional and could be omitted to conserve RAM, but inspecting the realizations from the prior predictive distribution is a good way to judge whether the priors are reasonable.The
generated quantitiesblock must contain an integer array namedranks_whose only values are zero or one, depending on whether the realization of a parameter from the posterior distribution exceeds the corresponding “true” realization, such astheta > theta_;. These are not actually "ranks" but can be used afterwards to reconstruct (thinned) ranks.The
generated quantitiesblock may contain a vector namedlog_likwhose values are the contribution to the log-likelihood by each observation. This is optional but facilitates calculating Pareto k shape parameters to judge whether the posterior distribution is sensitive to particular observations.
Although the user can pass additional arguments to sampling through the ...,
the following arguments are hard-coded and should not be passed through the ...:
-
pars = "ranks_"because nothing else needs to be stored for each posterior draw -
include = TRUEto ensure that"ranks_"is included rather than excluded -
chains = 1because only one chain is run for each integer less thanM -
seedbecause a sequence of seeds is used across theMruns to preserve independence across runs -
save_warmup = FALSEbecause the warmup realizations are not relevant -
thin = 1because thinning can and should be done after the Markov Chain is finished, as is done by thethinargument to theplotmethod in order to make the histograms consist of approximately independent realizations
Other arguments will take the default values used by sampling unless
passed through the .... Specifying refresh = 0 is recommended to avoid printing
a lot of intermediate progress reports to the screen. It may be necessary to pass a
list to the control argument of sampling with elements adapt_delta
and / or max_treedepth in order to obtain adequate results.
Ideally, users would want to see the absence of divergent transitions (which is shown
by the print method) and other warnings, plus an approximately uniform histogram
of the ranks for each parameter (which are shown by the plot method). See the
vignette for more details.
Value
The sbc function outputs a list of S3 class "sbc", which contains the
following elements:
-
ranksA list ofMmatrices, each with number of rows equal to the number of saved iterations and number of columns equal to the number of unknown parameters. These matrices contain the realizations of theranks_object from thegenerated quantitiesblock of the Stan program. -
YIf present, a matrix of realizations from the prior predictive distribution whose rows are equal to the number of observations and whose columns are equal toM, which are taken from they_object in thegenerated quantitiesblock of the Stan program. -
parsA matrix of realizations from the prior distribution whose rows are equal to the number of parameters and whose columns are equal toM, which are taken from thepars_object in thegenerated quantitiesblock of the Stan program. -
pareto_kA matrix of Pareto k shape parameter estimates orNULLif there is nolog_liksymbol in thegenerated quantitiesblock of the Stan program -
sampler_paramsA three-dimensional array that results from combining calls toget_sampler_paramsfor each of theMruns. The resulting matrix has rows equal to the number of post-warmup iterations, columns equal to six, andMfloors. The columns are named"accept_stat__","stepsize__","treedepth__","n_leapfrog__","divergent__", and"energy__". The most important of which is"divergent__", which should be all zeros and perhaps"treedepth__", which should only rarely get up to the value ofmax_treedepthpassed as an element of thecontrollist tosamplingor otherwise defaults to10.
The print method outputs the number of divergent transitions and
returns NULL invisibly.
The plot method returns a ggplot object
with histograms whose appearance can be further customized.
References
The Stan Development Team Stan Modeling Language User's Guide and Reference Manual. https://mc-stan.org.
Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian Inference Algorithms with Simulation-Based Calibration. arXiv preprint arXiv:1804.06788. https://arxiv.org/abs/1804.06788
See Also
stan_model and sampling
Examples
scode <- "
data {
int<lower = 1> N;
real<lower = 0> a;
real<lower = 0> b;
}
transformed data { // these adhere to the conventions above
real pi_ = beta_rng(a, b);
int y = binomial_rng(N, pi_);
}
parameters {
real<lower = 0, upper = 1> pi;
}
model {
target += beta_lpdf(pi | a, b);
target += binomial_lpmf(y | N, pi);
}
generated quantities { // these adhere to the conventions above
int y_ = y;
vector[1] pars_;
int ranks_[1] = {pi > pi_};
vector[N] log_lik;
pars_[1] = pi_;
for (n in 1:y) log_lik[n] = bernoulli_lpmf(1 | pi);
for (n in (y + 1):N) log_lik[n] = bernoulli_lpmf(0 | pi);
}
"