Validate {LaplacesDemon} | R Documentation |
Holdout Validation
Description
This function performs holdout validation on an object of class
demonoid
or pmc
, given both a modeled and validation
data set.
Usage
Validate(object, Model, Data, plot=FALSE, PDF=FALSE)
Arguments
object |
This is an object of class |
Model |
This is a model specification function for
|
Data |
This is a list that contains two lists of data, as
specified for |
plot |
Logical. When |
PDF |
Logical. When |
Details
There are numerous ways to validate a model. In this context,
validation means to assess the predictive performance of a model on
out-of-sample data. If reasonable, leave-one-out cross-validation
(LOOCV) via the conditional predictive ordinate (CPO) should be
considered when using LaplacesDemon
or
PMC
. For more information on CPO, see the accompanying
vignettes entitled "Bayesian Inference" and "Examples". CPO is
unavailable when using LaplaceApproximation
or
VariationalBayes
.
For LaplaceApproximation
or
VariationalBayes
, it is recommended that the user
perform holdout validation by comparing posterior predictive checks,
comparing the differences in the specified discrepancy measure.
When LOOCV is unreasonable, popular alternatives include k-fold
cross-validation and holdout validation. Although k-fold
cross-validation is not performed explicitly here, the user may
accomplish it with some effort. Of these methods, holdout validation
includes the most bias, but is the most common in applied use, since
only one model is fitted, rather than k-1
models in k-fold
cross-validation. The Validate
function performs holdout
validation.
For holdout validation, the observed data is sampled randomly into two
data sets of approximately equal size, or three data sets that
consists of two data sets of approximately equal size and a remainder
data set. Of the two data sets approximately equal in size, one is
called the modeled (or training) data set, and the other is called the
validation (or test) data set. The modeled data set is used when
updating the model. After the model is updated, both data sets are
predicted in the Validate
function, given the model. Predictive
loss is estimated for the validation data set, relative to the modeled
data set.
Predictive loss is associated with overfitting, differences between the model and validation data set, or model misspecification. Bayesian inference is reputed to be much more robust to overfitting than frequentist inference.
There are many ways to measure predictive loss, and within each approach, there are usually numerous possible loss functions. The log-likelihood of the model is a popular approximate utility function, and consequently, the deviance of the model is a popular loss function.
A vector of model-level (rather than record-level) deviance
samples is returned with each object of class demonoid
or
pmc
. The Validate
function obtains this vector for each
data set, and then calculates the Bayesian Predictive Information
Criterion (BPIC), as per Ando (2007). BPIC is a variation of the
Deviance Information Criterion (DIC) that has been modified for
predictive distributions. For more information on DIC (Spiegelhalter
et al., 2002), see the accompanying vignette entitled "Bayesian
Inference". The goal is to minimize BPIC.
When DIC is applied after the model, such as with a predictive
distribution, it is positively biased, or too small. The bias is due
to the same data \textbf{y}
being used both to construct the
posterior distributions and to evaluate pD, the penalty term for model
complexity. For example, for validation data set
\textbf{y}_{new}
, BPIC is:
BPIC = -2\mathrm{log}[p(\textbf{y}_{new}|\textbf{y},\Theta)] +
2pD
When plot=TRUE
, the distributions of the modeled and validation
deviances are plotted above, and the lower plot is the modeled
deviance subtracted from the validation deviance. When positive, this
distribution of the change in deviance is the loss in predictive
deviance associated with moving from the modeled data set to the
validation data set.
After using the Validate
function, the user is encouraged to
perform posterior predictive checks on each data set via the
summary.demonoid.ppc
or summary.pmc.ppc
function.
Value
This function returns a list with three components. The first two
components are also lists. Each list consists of y
,
yhat
, and Deviance
. The third component is a matrix that
reports the expected deviance, pD, and BPIC. The object is of class
demonoid.val
for LaplacesDemon
, or pmc.val
when associated with PMC
.
Author(s)
Statisticat, LLC. software@bayesian-inference.com
References
Ando, T. (2007). "Bayesian Predictive Information Criterion for the Evaluation of Hierarchical Bayesian and Empirical Bayes Models". Biometrika, 94(2), p. 443–458.
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., and van der Linde, A. (2002). "Bayesian Measures of Model Complexity and Fit (with Discussion)". Journal of the Royal Statistical Society, B 64, p. 583–639.
See Also
LaplaceApproximation
,
LaplacesDemon
,
PMC
, and
VariationalBayes
.
Examples
library(LaplacesDemon)
#Given an object called Fit of class demonoid, a Model specification,
#and a modeled data set (MyData.M) and validation data set (MyData.V):
#Validate(Fit, Model, Data=list(MyData.M=MyData.M, MyData.V=MyData.V))