pppvalue {bayesmeta} | R Documentation |

Compute posterior or prior predictive *p*-values from a
`bayesmeta`

object.

pppvalue(x, parameter = "mu", value = 0.0, alternative = c("two.sided", "less", "greater"), statistic = "median", rejection.region, n = 10, prior = FALSE, quietly = FALSE, parallel, seed, ...)

`x` |
a |

`parameter` |
the parameter to be tested. May be the effect
( |

`value` |
the (null-) hypothesized value. |

`alternative` |
the type of alternative hypothesis. |

`statistic` |
the figure to be used a the ‘test
statistic’, or ‘discrepancy variable’. May be chosen as |

`rejection.region` |
the test statistic's rejection region. May be
one of |

`n` |
the number of Monte Carlo replications to be generated. The
default value is |

`prior` |
a logical flag to request |

`quietly` |
a logical flag to show (or suppress) output during computation; this may also speed up computations slightly. |

`parallel` |
the number of parallel processes to utilize. By default, if multiple (k) cores are detected, then k-1 parallel processes are used. |

`seed` |
(optional) an |

`...` |
further parameters passed to ‘ |

Posterior predictive *p*-values are Bayesian analogues to
‘classical’ *p*-values (Meng, 1994; Gelman, Meng and Stern,
1996; Gelman et al., 2014). The `pppvalue()`

function allows to
compute these values for one- and two-sided hypotheses concerning the
effect (*μ*) or heterogeneity (*τ*) parameter, or one of
the study-specific effect parameters (*theta[i]*) in a
random-effects meta-analysis.

*Prior* predictive *p*-values have a
similar interpretation, but they have a stronger dependence on the
prior specification and are only available when the prior is proper;
for a more detailed discussion, see Gelman, Meng and Stern (1996,
Sec. 4).

The function may also be used to only generate samples (*τ*,
*μ*, *theta[i]*, *y[i]*) without having to also
derive a statistic or a *p*-value. In order to achieve that, the
‘`statistic`

’ argument may be specified as
‘`NA`

’, and generated samples may be recovered from the
‘`...$replicates`

’ output element.

The computation
is based on Monte Carlo sampling and repeated analysis of re-generated
data sets drawn from the parameters' (conditional) posterior
predictive (or prior) distribution; so the *p*-value derivation is
somewhat computationally expensive. The *p*-value eventually is
computed based on how far in the tail area the actual data are (in
terms of the realized ‘test statistic’ or ‘discrepancy’)
relative to the Monte-Carlo-sampled distribution. Accuracy of the
computed *p*-value hence strongly depends on the number of samples
(as specified through the ‘`n`

’ argument) that are
generated. Also, (near-) zero *p*-values need to be interpreted
with caution, and in relation to the used Monte Carlo sample size
(`n`

).

The ‘`statistic`

’ argument determines the statistic
to be computed from the data as a measure of deviation from the null
hypothesis. If specified as `"Q"`

, then Cochran's *Q* statistic is
computed; this is useful for testing for homogeneity (*τ=0*). If specified as
one of the row names of the ‘`x$summary`

’ element, then,
depending on the type of null hypothesis specified through the
‘`parameter`

’ argument, the corresponding parameter's posterior
quantity is used for the statistic. If specified as `"t"`

, then a
*t*-type statistic is computed (the difference between the
corresponding parameter's posterior mean and its hypothesized value,
divided by the posterior standard deviation). If specified as
`"cdf"`

, the parameter's marginal posterior cumulative
distribution function evaluated a the hypothesized value
(‘`value`

’) is used.

The ‘`statistic`

’ argument may also be specified as an
arbitrary `function`

of the data (*y*). The `function`

's
first argument then needs to be the data (*y*), additional
arguments may be passed as arguments (‘`...`

’) to the
‘`pppvalue()`

’ function. See also the examples below.

Specification of one- or
two-sided hypotheses not only has implications for the determination
of the *p*-value from the samples, but also for the sampling
process itself. Parameter values are drawn from a subspace according
to the null hypothesis, which for a two-sided test is a line, and for
a one-sided test is a half-plane. This also implies that one- and
two-sided *p*-values cannot simply be converted into one
another.

For example, when specifying
`pppvalue(..., param="mu", val=0, alt="two.sided")`

,
then first paramater values (*τ*, *μ*) are drawn from the
conditional posterior distribution *p(tau, mu | y, sigma, mu=0)*, and subsequently new data sets
are generated based on the parameters. If a one-sided hypothesis is
specified, e.g. via
`pppvalue(..., param="mu", val=0, alt="less")`

,
then parameters are drawn from *p(tau, mu | y, sigma, mu>0)*.

For a hypothesis concerning the individual effect parameters
*theta[i]*, conditions are imposed on the corresponding
*theta[i]*. For example, for a specification of
`pppvalue(..., param=2, val=0, alt="less")`

, the
hypothesis concerns the *i*=2nd study's effect paramater
*theta[2]*. First a sample is generated from
*p(theta[2] | y, sigma,
theta[2]>0)*. Then samples of *μ* and *τ* are generated
by conditioning on the generated *theta[2]* value, and
data *y* are generated by conditioning on all three.

Unless explicitly specified through the
‘`rejection.region`

’ argument, the test statistic's
“rejection region” (the direction in which extreme `statistic`

values indicate a departure from the null hypothesis) is set based on the
‘`alternative`

’ and ‘`statistic`

’
parameters. The eventually used setting can be checked in the output's
‘`...$rejection.region`

’ component.

When aiming to compute a *p*-value, it is
probably a good idea to first start with a smaller ‘`n`

’
argument to get a rough idea of the *p*-value's order of magnitude
as well as the computational speed, before going over to a larger,
more realistic `n`

value. The implementation is able to utilize
multiple processors or cores via the parallel package; details
may be specified via the ‘`parallel`

’ argument.

A `list`

of class ‘`htest`

’ containing the following
components:

`statistic` |
the ‘test statistic’ (or ‘discrepancy’) value based on the actual data. |

`parameter` |
the number ( |

`p.value` |
the derived |

`null.value` |
the (null-) hypothesized parameter value. |

`alternative` |
the type of alternative hypothesis. |

`method` |
a character string indicating what type of test was performed. |

`data.name` |
the name of the underlying |

`call` |
an object of class |

`rejection.region` |
the test statistic's rejection region. |

`replicates` |
a |

`computation.time` |
The computation time (in seconds) used. |

Christian Roever christian.roever@med.uni-goettingen.de

X.-L. Meng.
Posterior predictive p-values.
*The Annals of Statistics*, **22**(3):1142-1160, 1994.
doi: 10.1214/aos/1176325622.

A. Gelman, X.-L. Meng, H. Stern.
Posterior predictive assessment of model fitness
via realized discrepancies.
*Statistica Sinica*, **6**(4):733-760, 1996.

A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari,
D.B. Rubin.
*Bayesian data analysis*.
Chapman & Hall / CRC, Boca Raton, 2014.

C. Roever.
Bayesian random-effects meta-analysis using the bayesmeta R package.
*Journal of Statistical Software*, **93**(6):1-51, 2020.
doi: 10.18637/jss.v093.i06.

## Not run: # perform a meta analysis; # load data: data("CrinsEtAl2014") # compute effect sizes (log odds ratios) from count data # (using "metafor" package's "escalc()" function): require("metafor") crins.srr <- escalc(measure="OR", ai=exp.SRR.events, n1i=exp.total, ci=cont.SRR.events, n2i=cont.total, slab=publication, data=CrinsEtAl2014, subset=c(1,4,6)) # analyze: bma <- bayesmeta(crins.srr, mu.prior.mean=0, mu.prior.sd=4, tau.prior=function(t){dhalfnormal(t, scale=0.5)}) # compute a 2-sided p-value for the effect (mu) parameter # (note: this may take a while!): p <- pppvalue(bma, parameter="mu", value=0, n=100) # show result: print(p) # show the test statistic's distribution # along with its actualized value: plot(ecdf(p$replicates$statistic[,1]), xlim=range(c(p$statistic, p$replicates$statistic[,1]))) abline(v=p$statistic, col="red") # show the parameter values # drawn from the (conditional) posterior distribution: plot(bma, which=2) abline(h=p$null.value) # (the null-hypothesized mu value) points(p$replicates$tau, p$replicates$mu, col="cyan") # (the samples) ###################################################################### # Among the 3 studies, only the first (Heffron, 2003) was randomized. # One might wonder about this particular study's effect (theta[1]) # in the light of the additional evidence and compute a one-sided # p-value: p <- pppvalue(bma, parameter="Heffron", value=0, n=100, alternative="less") print(p) ###################################################################### # One may also define one's own 'test' statistic to be used. # For example, one could utilize the Bayes factor to generate # a p-value for the homogeneity (tau=0) hypothesis: BF <- function(y, sigma) { bm <- bayesmeta(y=y, sigma=sigma, mu.prior.mean=0, mu.prior.sd=4, tau.prior=function(t){dhalfnormal(t, scale=0.5)}, interval.type="central") # (central intervals are faster to compute; # interval type otherwise is not relevant here) return(bm$bayesfactor[1,"tau=0"]) } # NOTE: the 'bayesmeta()' arguments above should probably match # the specifications from the original analysis p <- pppvalue(bma, parameter="tau", statistic=BF, value=0, n=100, alternative="greater", rejection.region="lower.tail", sigma=bma$sigma) print(p) ###################################################################### # If one is only interested in generating samples (and not in test # statistics or p-values), one may specify the 'statistic' argument # as 'NA'. # Note that different 'parameter', 'value' and 'alternative' settings # imply different sampling schemes. p <- pppvalue(bma, parameter="mu", statistic=NA, value=0, alternative="less", n=100) plot(bma, which=2) abline(h=p$null.value) points(p$replicates$tau, p$replicates$mu, col="cyan") ## End(Not run)

[Package *bayesmeta* version 2.7 Index]