summary.laplace.ppc {LaplacesDemon}  R Documentation 
Posterior Predictive Check Summary
Description
This may be used to summarize either new, unobserved instances of
\textbf{y}
(called \textbf{y}^{new}
) or
replicates of \textbf{y}
(called
\textbf{y}^{rep}
). Either \textbf{y}^{new}
or
\textbf{y}^{rep}
is summarized, depending on
predict.laplace
.
Usage
## S3 method for class 'laplace.ppc'
summary(object, Categorical, Rows, Discrep,
d, Quiet, ...)
Arguments
object 
An object of class 
Categorical 
Logical. If 
Rows 
An optional vector of row numbers, for example

Discrep 
A character string indicating a discrepancy
test. 
d 
This is an optional integer to be used with the

Quiet 
This logical argument defaults to 
... 
Additional arguments are unused. 
Details
This function summarizes an object of class laplace.ppc
, which
consists of posterior predictive checks on either
\textbf{y}^{new}
or \textbf{y}^{rep}
,
depending respectively on whether unobserved instances of
\textbf{y}
or the model sample of \textbf{y}
was
used in the predict.laplace
function. The deviance and
monitored variables are also summarized.
The purpose of a posterior predictive check is to assess how well (or poorly) the model fits the data, or to assess discrepancies between the model and the data. For more information on posterior predictive checks, see https://web.archive.org/web/20150215050702/http://www.bayesianinference.com/posteriorpredictivechecks.
When \textbf{y}
is continuous and known, this function
estimates the predictive concordance between \textbf{y}
and
\textbf{y}^{rep}
as per Gelfand (1996), and the
predictive quantile (PQ), which is for recordlevel outlier detection
used to calculate Gelfand's predictive concordance.
When \textbf{y}
is categorical and known, this function
estimates the recordlevel lift, which is
p(yhat[i,] = y[i]) / [p(y = j) / n]
, or
the number of correctly predicted samples over the rate of that
category of \textbf{y}
in vector \textbf{y}
.
A discrepancy measure is an approach to studying discrepancies between the model and data (Gelman et al., 1996). Below is a list of discrepancy measures, followed by a brief introduction to discrepancy analysis:
The
"ChiSquare"
discrepancy measure is the chisquare goodnessoffit test that is recommended by Gelman. For each record i=1:N, this returns (y[i]  E(y[i]))^2 / var(yhat[i,]).The
"ChiSquare2"
discrepancy measure returns the following for each record: Pr(chisq.rep[i,] > chisq.obs[i,]), where chisq.obs[i,] < (y[i]  E(y[i]))^2 / E(y[i]), and chisq.rep[i,] < (yhat[i,]  E(yhat[i,]))^2 / E(yhat[i,]), and the overall discrepancy is the percent of records that were outside of the 95% quantilebased probability interval (seep.interval
).The
"Kurtosis"
discrepancy measure returns the kurtosis of\textbf{y}^{rep}
for each record, and the discrepancy statistic is the mean for all records. This does not measure discrepancies between the model and data, and is useful for finding kurtotic replicate distributions.The
"Lcriterion"
discrepancy measure of Laud and Ibrahim (1995) provides the recordlevel combination of two components (see below), and the discrepancy statistic is the sum,L
, as well as a calibration number,S.L
. For more information on the Lcriterion, see the accompanying vignette entitled "Bayesian Inference".The
"MASE"
(Mean Absolute Scaled Error) is a discrepancy measure for the accuracy of timeseries forecasts, estimated as(y  yhat) / mean(abs(diff(y)))
. The discrepancy statistic is the mean of the recordlevel values.The
"MSE"
(Mean Squared Error) discrepancy measure provides the MSE for each record across all replicates, and the discrepancy statistic is the mean of the recordlevel MSEs. MSE and quadratic loss are identical.The
"PPL"
(Posterior Predictive Loss) discrepancy measure of Gelfand and Ghosh (1998) provides the recordlevel combination of two components: one involves the predictive variance and the other includes the accuracy of the means of the predictive distribution. Thed=0
argument applies the following weight to the accuracy component, which is then added to the variance component:d/(d+1)
. For\textbf{y}^{new}
, used=0
. For\textbf{y}^{rep}
and model comparison,d
is commonly set to 1, 10, or 100000. Larger values ofd
put more stress on fit and downgrade the precision of the estimates.The
"Quadratic Loss"
discrepancy measure provides the mean quadratic loss for each record across all replicates, and the discrepancy statistic is the mean of the recordlevel mean quadratic losses. Quadratic loss and MSE are identical, and quadratic loss is the negative of quadratic utility.The
"Quadratic Utility"
discrepancy measure provides the mean quadratic utility for each record across all replicates, and the discrepancy statistic is the mean of the recordlevel mean quadratic utilities. Quadratic utility is the negative of quadratic loss.The
"RMSE"
(Root Mean Squared Error) discrepancy measure provides the RMSE for each record across all replicates, and the discrepancy statistic is the mean of the recordlevel RMSEs.The
"Skewness"
discrepancy measure returns the skewness of\textbf{y}^{rep}
for each record, and the discrepancy statistic is the mean for all records. This does not measure discrepancies between the model and data, and is useful for finding skewed replicate distributions.The
"max(yhat[i,]) > max(y)"
discrepancy measure returns a recordlevel indicator when a record's maximum\textbf{y}^{rep}_i
exceeds the maximum of\textbf{y}
. The discrepancy statistic is the mean of the recordlevel indicators, reporting the proportion of records with replications that exceed the maximum of\textbf{y}
.The
"mean(yhat[i,]) > mean(y)"
discrepancy measure returns a recordlevel indicator when the mean of a record's\textbf{y}^{rep}_i
is greater than the mean of\textbf{y}
. The discrepancy statistic is the mean of the recordlevel indicators, reporting the proportion of records with mean replications that exceed the mean of\textbf{y}
.The
"mean(yhat[i,] > d)"
discrepancy measure returns a recordlevel proportion of\textbf{y}^{rep}_i
that exceeds a specified value,d
. The discrepancy statistic is the mean of the recordlevel proportions.The
"mean(yhat[i,] > mean(y))"
discrepancy measure returns a recordlevel proportion of\textbf{y}^{rep}_i
that exceeds the mean of\textbf{y}
. The discrepancy statistic is the mean of the recordlevel proportions.The
"min(yhat[i,]) < min(y)"
discrepancy measure returns a recordlevel indicator when a record's minimum\textbf{y}^{rep}_i
is less than the minimum of\textbf{y}
. The discrepancy statistic is the mean of the recordlevel indicators, reporting the proportion of records with replications less than the minimum of\textbf{y}
.The
"round(yhat[i,]) = d"
discrepancy measure returns a recordlevel proportion of\textbf{y}^{rep}_i
that, when rounded, is equal to a specified discrete value,d
. The discrepancy statistic is the mean of the recordlevel proportions.The
"sd(yhat[i,]) > sd(y)"
discrepancy measure returns a recordlevel indicator when the standard deviation of replicates is larger than the standard deviation of all of\textbf{y}
. The discrepancy statistic is the mean of the recordlevel indicators, reporting the proportion of records with larger standard deviations than\textbf{y}
.The
"p(yhat[i,] != y[i])"
discrepancy measure returns the recordlevel probability that\textbf{y}^{rep}_i
is not equal to\textbf{y}
. This is valid when\textbf{y}
is categorical andyhat
is the predicted category. The probability is the proportion of replicates.
After observing a discrepancy statistic, the user attempts to improve the model by revising the model to account for discrepancies between data and the current model. This approach to model revision relies on an analysis of the discrepancy statistic. Given a discrepancy measure that is based on model fit, such as the Lcriterion, the user may correlate the recordlevel discrepancy statistics with the dependent variable, independent variables, and interactions of independent variables. The discrepancy statistic should not correlate with the dependent and independent variables. Interaction variables may be useful for exploring new relationships that are not in the current model. Alternatively, a decision tree may be applied to the recordlevel discrepancy statistics, given the independent variables, in an effort to find relationships in the data that may be helpful in the model. Model revision may involve the addition of a finite mixture component to account for outliers in discrepancy, or specifying the model with a distribution that is more robust to outliers. There are too many suggestions to include here, and discrepancy analysis varies by model.
Value
This function returns a list with the following components:
BPIC 
The Bayesian Predictive Information Criterion (BPIC) was
introduced by Ando (2007). BPIC is a variation of the Deviance
Information Criterion (DIC) that has been modified for predictive
distributions. For more information on DIC (Spiegelhalter
et al., 2002), see the accompanying vignette entitled "Bayesian
Inference". 
Concordance 
This is the percentage of the records of y that are
within the 95% quantilebased probability interval (see

Mean Lift 
This is the mean of the recordlevel lifts, and
occurs only when 
Discrepancy.Statistic 
This is only reported if the

Lcriterion 
The Lcriterion (Laud and Ibrahim, 1995) was
developed for model and variable selection. It is a sum of two
components: one involves the predictive variance and the other
includes the accuracy of the means of the predictive
distribution. The Lcriterion measures model performance with a
combination of how close its predictions are to the observed data
and variability of the predictions. Better models have smaller
values of 
Monitor 
This is a 
Summary 
When 
Author(s)
Statisticat, LLC.
References
Ando, T. (2007). "Bayesian Predictive Information Criterion for the Evaluation of Hierarchical Bayesian and Empirical Bayes Models". Biometrika, 94(2), p. 443–458.
Gelfand, A. (1996). "Model Determination Using Sampling Based Methods". In Gilks, W., Richardson, S., Spiegehalter, D., Chapter 9 in Markov Chain Monte Carlo in Practice. Chapman and Hall: Boca Raton, FL.
Gelfand, A. and Ghosh, S. (1998). "Model Choice: A Minimum Posterior Predictive Loss Approach". Biometrika, 85, p. 1–11.
Gelman, A., Meng, X.L., and Stern H. (1996). "Posterior Predictive Assessment of Model Fitness via Realized Discrepancies". Statistica Sinica, 6, p. 733–807.
Laud, P.W. and Ibrahim, J.G. (1995). "Predictive Model Selection". Journal of the Royal Statistical Society, B 57, p. 247–262.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. (2002). "Bayesian Measures of Model Complexity and Fit (with Discussion)". Journal of the Royal Statistical Society, B 64, p. 583–639.
See Also
LaplaceApproximation
,
predict.laplace
, and
p.interval
.
Examples
### See the LaplaceApproximation function for an example.