plot.pmc.ppc {LaplacesDemon} | R Documentation |
Plots of Posterior Predictive Checks
Description
This may be used to plot, or save plots of, samples in an object of
class pmc.ppc
. A variety of plots is provided.
Usage
## S3 method for class 'pmc.ppc'
plot(x, Style=NULL, Data=NULL, Rows=NULL,
PDF=FALSE, ...)
Arguments
x |
This required argument is an object of class |
Style |
This optional argument specifies one of several styles of plots,
and defaults to |
Data |
This optional argument accepts the data set used when updating the
model. Data is required only with certain plot styles, including
|
Rows |
This optional argument is for a vector of row numbers that
specify the records associated by row in the object of class
|
PDF |
This logical argument indicates whether or not the user wants Laplace's Demon to save the plots as a .pdf file. |
... |
Additional arguments are unused. |
Details
This function can be used to produce a variety of posterior predictive
plots, and the style of plot is selected with the Style
argument. Below are some notes on the styles of plots.
Covariates
requires Data
to be specified, and also
requires that the covariates are named X
or x
. A plot
is produced for each covariate column vector against yhat, and is
appropriate when y is not categorical.
Covariates, Categorical DV
requires Data
to be
specified, and also requires that the covariates are named X
or
x
. A plot is produced for each covariate column vector against
yhat, and is appropriate when y is categorical.
Density
plots show the kernel density of the posterior
predictive distribution for each selected row of y (all are selected
by default). A vertical red line indicates the position of the
observed y along the x-axis. When the vertical red line is close to
the middle of a normal posterior predictive distribution, then there
is little discrepancy between y and the posterior predictive
distribution. When the vertical red line is in the tail of the
distribution, or outside of the kernel density altogether, then
there is a large discrepancy between y and the posterior predictive
distribution. Large discrepancies may be considered outliers, and
moreover suggest that an improvement in model fit should be
considered.
DW
plots the distributions of the Durbin-Watson (DW) test
statistics (Durbin and Watson, 1950), both observed
(d^{obs}
as a transparent, black density) and replicated
(d^{rep}
as a transparent, red density). The distribution
of d^{obs}
is estimated from the model, and
d^{rep}
is simulated from normal residuals without
autocorrelation, where the number of simulations are the same as the
observed number. This DW test may be applied to the residuals of
univariate time-series models (or otherwise ordered residuals) to
detect first-order autocorrelation. Autocorrelated residuals are not
independent. The DW test is applicable only when the residuals are
normally-distributed, higher-order autocorrelation is not present, and
y is not used also as a lagged predictor. The DW test statistic,
d^{obs}
, occurs in the interval (0,4), where 0 is
perfect positive autocorrelation, 2 is no autocorrelation, and 4 is
perfect negative autocorrelation. The following summary is reported on
the plot: the mean of d^{obs}
(and its 95% probability
interval), the probability that d^{obs} > d^{rep}
, and whether or not autocorrelation is found. Positive
autocorrelation is reported when the observed process is greater than
the replicated process in 2.5% of the samples, and negative
autocorrelation is reported when the observed process is greater than
the replicated process in 97.5% of the samples.
DW, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each column-wise
vector of residuals with a univariate Durbin-Watson test, as in
DW
above. This plot is appropriate when Y is multivariate, not
categorical, and residuals are desired to be tested column-wise for
first-order autocorrelation.
ECDF
(Empirical Cumulative Distribution Function) plots compare
the ECDF of y with three ECDFs of yhat based on the 2.5%, 50%
(median), and 97.5% of its distribution. The ECDF(y) is defined as
the proportion of values less than or equal to y. This plot is
appropriate when y is univariate and at least ordinal.
Fitted
plots compare y with the probability interval of its
replicate, and provide loess smoothing. This plot is appropriate when
y is univariate and not categorical.
Fitted, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exists in the data
set with exactly that name. These plots compare each column-wise
vector of y in Y with its replicates and provide loess smoothing.
This plot is appropriate when Y is multivariate, not categorical, and
desired to be seen column-wise.
Fitted, Multivariate, R
requires Data
to be
specified, and also requires that variable Y
exists in the data
set with exactly that name. These plots compare each row-wise
vector of y in Y with its replicates and provide loess smoothing.
This plot is appropriate when Y is multivariate, not categorical, and
desired to be seen row-wise.
Jarque-Bera
plots the distributions of the Jarque-Bera (JB)
test statistics (Jarque and Bera, 1980), both observed
(JB^{obs}
as a transparent black density) and replicated
(JB^{rep}
as a transparent red density). The
distribution of JB^{obs}
is estimated from the model,
and JB^{rep}
is simulated from normal residuals, where
the number of simulations are the same as the observed number. This
Jarque-Bera test may be applied to the residuals of
univariate models to test for normality. The Jarque-Bera test does not
test normality per se, but whether or not the distribution has
kurtosis and skewness that match a normal distribution, and is
therefore a test of the moments of a normal distribution. The
following summary is reported on the plot: the mean of
JB^{obs}
(and its 95% probability interval), the
probability that JB^{obs} > JB^{rep}
, and
whether or not normality is indicated. Non-normality is reported when
the observed process is greater than the replicated process in either
2.5% or 97.5% of the samples.
Jarque-Bera, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each column-wise
vector of residuals with a univariate Jarque-Bera test, as in
Jarque-Bera
above. This plot is appropriate when Y is
multivariate, not categorical, and residuals are desired to be
tested column-wise for normality.
Mardia
plots the distributions of the skewness (K3) and
kurtosis (K4) test statistics (Mardia, 1970), both observed
(K3^{obs}
and K4^{obs}
as transparent
black density) and replicated (K3^{rep}
and
K4^{rep}
as transparent red density). The distributions
of K3^{obs}
and K4^{obs}
are estimated
from the model, and both K3^{rep}
K4^{rep}
are simulated from multivariate normal residuals, where the number of
simulations are the same as the observed number. This Mardia's test
may be applied to the residuals of multivariate models to test for
multivariate normality. Mardia's test does not test for multivariate
normality per se, but whether or not the distribution has kurtosis and
skewness that match a multivariate normal distribution, and is
therefore a test of the moments of a multivariate normal
distribution. The following summary is reported on the plots: the
means of K3^{obs}
and K4^{obs}
(and
the associated 95% probability intervals), the probabilities that
K3^{obs} > K3^{rep}
and
K4^{obs} > K4^{rep}
, and whether or not
multivariate normality is indicated. Non-normality is reported when
the observed process is greater than the replicated process in either
2.5% or 97.5% of the samples. Mardia
requires Data
to
be specified, and also requires that variable Y
exist in the
data set with exactly that name. Y
must be a N \times P
matrix of N
records and P
variables. Source
code was modified from the deprecated package QRMlib.
Predictive Quantiles
plots compare y with the predictive
quantile (PQ) of its replicate. This may be useful in looking for
patterns with outliers. Instances outside of the gray lines are
considered outliers.
Residual Density
plots the residual density of the median of
the samples. A vertical red line occurs at zero. This plot may be
useful for inspecting a distributional assumption of residual
variance. This plot is appropriate when y is univariate and
continuous.
Residual Density, Multivariate C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are column-wise plots of residual
density, given the median of the samples. These plots may be useful
for inspecting a distributional assumption of residual variance.
This plot is appropriate when Y is multivariate, continuous, and
densities are desired to be seen column-wise.
Residual Density, Multivariate R
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are row-wise plots of residual
density, given the median of the samples. These plots may be useful
for inspecting a distributional assumption of residual variance.
This plot is appropriate when Y is multivariate, continuous, and
densities are desired to be seen row-wise.
Residuals
plots compare y with its residuals. The probability
interval is plotted as a line. This plot is appropriate when y
is univariate.
Residuals, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are plots of each column-wise
vector of residuals. The probability interval is plotted as a
line. This plot is appropriate when Y is multivariate, not
categorical, and the residuals are desired to be seen column-wise.
Residuals, Multivariate, R
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are plots of each row-wise
vector of residuals. The probability interval is plotted as a
line. This plot is appropriate when Y is multivariate, not
categorical, and the residuals are desired to be seen row-wise.
Space-Time by Space
requires Data
to be specified, and
also requires that the following variables exist in the data set with
exactly these names: latitude
, longitude
, S
, and
T
. These space-time plots compare the S x T matrix Y with the S
x T matrix Yrep, producing one time-series plot per point s in space,
for a total of S plots. Therefore, these are time-series plots for
each point s in space across T time-periods. See Time-Series
plots below.
Space-Time by Time
requires Data
to be specified, and
also requires that the following variables exist in the data set with
exactly these names: latitude
, longitude
, S
, and
T
. These space-time plots compare the S x T matrix Y with the S
x T matrix Yrep, producing one spatial plot per time-period, and T
plots will be produced. See Spatial
plots below.
Spatial
requires Data
to be specified, and also requires
that the following variables exist in the data set with exactly these
names: latitude
and longitude
. This spatial plot shows
yrep plotted according to its coordinates, and is color-coded so that
higher values of yrep become more red, and lower values become more
yellow.
Spatial Uncertainty
requires Data
to be specified, and
also requires that the following variables exist in the data set with
exactly these names: latitude
and longitude
. This
spatial plot shows the probability interval of yrep plotted according
to its coordinates, and is color-coded so that wider probability
intervals become more red, and lower values become more yellow.
Time-Series
plots compare y with its replicate, including the
median and probability interval quantiles. This plot is appropriate
when y is univariate and ordered by time.
Time-Series, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each column-wise
time-series in Y with its replicate, including the median and
probability interval quantiles. This plot is appropriate when y is
multivariate and each time-series is indexed by column in Y.
Time-Series, Multivariate, R
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each row-wise
time-series in Y with its replicate, including the median and
probability interval quantiles. This plot is appropriate when y is
multivariate and each time-series is indexed by row in Y, such as is
typically true in panel models.
Author(s)
Statisticat, LLC. software@bayesian-inference.com
References
Durbin, J., and Watson, G.S. (1950). "Testing for Serial Correlation in Least Squares Regression, I." Biometrika, 37, p. 409–428.
Jarque, C.M. and Bera, A.K. (1980). "Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals". Economics Letters, 6(3), p. 255–259.
Mardia, K.V. (1970). "Measures of Multivariate Skewness and Kurtosis with Applications". Biometrika, 57(3), p. 519–530.
See Also
PMC
and
predict.pmc
.
Examples
### See the PMC function for an example.