PolyaUrnBivarDirichlet {carbondate} | R Documentation |
Calibrate and Summarise Multiple Radiocarbon Samples via a Bayesian Non-Parametric DPMM (with Polya Urn Updating)
Description
This function calibrates sets of multiple radiocarbon ({}^{14}
C)
determinations, and simultaneously summarises the resultant calendar age information.
This is achieved using Bayesian non-parametric density estimation,
providing a statistically-rigorous alternative to summed probability
distributions (SPDs).
It takes as an input a set of {}^{14}
C determinations and associated 1\sigma
uncertainties, as well as the radiocarbon age calibration curve to be used. The samples
are assumed to arise from an (unknown) shared calendar age distribution f(\theta)
that
we would like to estimate, alongside performing calibration of each sample.
The function models the underlying distribution f(\theta)
as a Dirichlet process
mixture model (DPMM), whereby the samples are considered to arise from an unknown number of
distinct clusters. Fitting is achieved via MCMC.
It returns estimates for the calendar age of each individual radiocarbon sample; and broader
output (including the means and variances of the underpinning calendar age clusters)
that can be used by other library functions to provide a predictive estimate of the
shared calendar age density f(\theta)
.
For more information read the vignette:
vignette("Non-parametric-summed-density", package = "carbondate")
Note: The library provides two slightly-different update schemes for the MCMC. In this particular function, updating of the DPMM is achieved by a Polya Urn approach (Neal 2000) This is our recommended updating approach based on testing. The alternative, slice-sampled, approach can be found at WalkerBivarDirichlet
References:
Heaton, TJ. 2022. “Non-parametric Calibration of Multiple Related Radiocarbon
Determinations and their Calendar Age Summarisation.” Journal of the Royal Statistical
Society Series C: Applied Statistics 71 (5):1918-56. https://doi.org/10.1111/rssc.12599.
Neal, RM. 2000. “Markov Chain Sampling Methods for Dirichlet Process Mixture Models.”
Journal of Computational and Graphical Statistics 9 (2):249 https://doi.org/10.2307/1390653.
Usage
PolyaUrnBivarDirichlet(
rc_determinations,
rc_sigmas,
calibration_curve,
F14C_inputs = FALSE,
n_iter = 1e+05,
n_thin = 10,
use_F14C_space = TRUE,
slice_width = NA,
slice_multiplier = 10,
n_clust = min(10, length(rc_determinations)),
show_progress = TRUE,
sensible_initialisation = TRUE,
lambda = NA,
nu1 = NA,
nu2 = NA,
A = NA,
B = NA,
alpha_shape = NA,
alpha_rate = NA,
mu_phi = NA,
calendar_ages = NA
)
Arguments
rc_determinations |
A vector of observed radiocarbon determinations. Can be provided either as
|
rc_sigmas |
A vector of the (1-sigma) measurement uncertainties for the
radiocarbon determinations. Must be the same length as |
calibration_curve |
A dataframe which must contain one column |
F14C_inputs |
|
n_iter |
The number of MCMC iterations (optional). Default is 100,000. |
n_thin |
How much to thin the MCMC output (optional). Will store every
|
use_F14C_space |
If |
slice_width |
Parameter for slice sampling (optional). If not given a value
is chosen intelligently based on the spread of the initial calendar ages.
Must be given if |
slice_multiplier |
Integer parameter for slice sampling (optional).
Default is 10. Limits the slice size to |
n_clust |
The number of clusters with which to initialise the sampler (optional). Must
be less than the length of |
show_progress |
Whether to show a progress bar in the console during
execution. Default is |
sensible_initialisation |
Whether to use sensible values to initialise the sampler
and an automated (adaptive) prior on |
lambda , nu1 , nu2 |
Hyperparameters for the prior on the mean
where
|
A , B |
Prior on |
alpha_shape , alpha_rate |
Shape and rate hyperparameters that specify
the prior for the Dirichlet Process (DP) concentration, |
mu_phi |
Initial value of the overall cluster centering |
calendar_ages |
The initial estimate for the underlying calendar ages
(optional). If supplied, it must be a vector with the same length as
|
Value
A list with 10 items. The first 8 items contain output of the model, each of
which has one dimension of size n_{\textrm{out}} =
\textrm{floor}( n_{\textrm{iter}}/n_{\textrm{thin}}) + 1
. The rows in these items store
the state of the MCMC from every n_{\textrm{thin}}
{}^\textrm{th}
iteration:
cluster_identifiers
A list of length
n_{\textrm{out}}
each entry gives the cluster allocation (an integer between 1 andn_clust
) for each observation on the relevant MCMC iteration. Information on the state of these calendar age clusters (means and precisions) can be found in the other output items.alpha
A double vector of length
n_{\textrm{out}}
giving the Dirichlet Process concentration parameter\alpha
.n_clust
An integer vector of length
n_{\textrm{out}}
giving the current number of clusters in the model.phi
A list of length
n_{\textrm{out}}
each entry giving a vector of lengthn_clust
of the means of the current calendar age clusters\phi_j
.tau
A list of length
n_{\textrm{out}}
each entry giving a vector of lengthn_clust
of the precisions of the current calenadar age cluster\tau_j
.observations_per_cluster
A list of length
n_{\textrm{out}}
each entry giving a vector of lengthn_clust
of the number of observations for that cluster.calendar_ages
An
n_{\textrm{out}}
byn_{\textrm{obs}}
integer matrix. Gives the current estimate for the calendar age of each individual observation.mu_phi
A vector of length
n_{\textrm{out}}
giving the overall centering\mu_{\phi}
of the calendar age clusters.
where n_{\textrm{obs}}
is the number of radiocarbon observations i.e.
the length of rc_determinations
.
The remaining items give information about the input data, input parameters (or
those calculated using sensible_initialisation
) and the update_type
update_type
A string that always has the value "Polya Urn".
input_data
A list containing the
{}^{14}
C data used, and the name of the calibration curve used.input_parameters
A list containing the values of the fixed hyperparameters
lambda
,nu1
,nu2
,A
,B
,alpha_shape
,alpha_rate
andmu_phi
, and the slice parametersslice_width
andslice_multiplier
.
See Also
WalkerBivarDirichlet for our less-preferred MCMC method to update the Bayesian DPMM
(otherwise an identical model); and PlotCalendarAgeDensityIndividualSample,
PlotPredictiveCalendarAgeDensity and PlotNumberOfClusters
to access the model output and estimate the calendar age information.
See also PPcalibrate for an an alternative (similarly rigorous) approach to
calibration and summarisation of related radiocarbon determinations using a variable-rate Poisson process
Examples
# Note these examples are shown with a small n_iter to speed up execution.
# When you run ensure n_iter gives convergence (try function default).
# Basic usage making use of sensible initialisation to set most values and
# using a saved example data set and the IntCal20 curve.
polya_urn_output <- PolyaUrnBivarDirichlet(
two_normals$c14_age,
two_normals$c14_sig,
intcal20,
n_iter = 100,
show_progress = FALSE)
# The radiocarbon determinations can be given as F14C concentrations
polya_urn_output <- PolyaUrnBivarDirichlet(
two_normals$f14c,
two_normals$f14c_sig,
intcal20,
F14C_inputs = TRUE,
n_iter = 100,
show_progress = FALSE)