strat {coffee}R Documentation

Model chronologically ordered dates

Description

Model radiocarbon dates (or dates that are already on the cal BP scale) of a deposit that is known to have accumulated over time, and for which therefore the dated positions can be safely assumed to are in chronological order.

Usage

strat(
  name = "mystrat",
  strat.dir = "strats",
  run = TRUE,
  its = 50000,
  burnin = 100,
  thinning = c(),
  internal.thinning = c(),
  min.its = 1000,
  write.MCMC = FALSE,
  MCMC.dir = tempdir(),
  remove.tmp = TRUE,
  init.ages = c(),
  ballpark.method = 2,
  y.scale = "dates",
  showrun = FALSE,
  sep = ",",
  normal = FALSE,
  delta.R = 0,
  delta.STD = 0,
  t.a = 3,
  t.b = 4,
  cc = 1,
  cc.dir = c(),
  prob = 0.95,
  postbomb = FALSE,
  BCAD = FALSE,
  ask = FALSE,
  talk = TRUE,
  show.progress = TRUE,
  clean.garbage = TRUE,
  save.info = TRUE,
  age.span = c(),
  ...
)

Arguments

name

Name of the stratigraphy dataset. Defaults to "mystrat".

strat.dir

The directory where the folders of the individual stratigraphies live. Defaults to strat.dir="strats".

run

Whether or not to run the data. If set to FALSE, will try to load existing run data.

its

Amount of iterations to be run. Setting this to low numbers (e.g., 1000) will result in fast but less stable and less reliable runs. Higher values will take longer but result in more stable and robust runs. Defaults to 50000. Aim to set this to such values that at least 3000 iterations remain after removing the burnin and thinning.

burnin

Amount of iterations to remove at the start of the run. Defaults to 100.

thinning

After running all iterations, only some will be stored. For example, if thinning is set at the default 50, only every 50th MCMC iteration will be stored, and the others will be discarded. This is to remove the dependence between neighbouring MCMC iterations. Defaults to a value calculated from the MCMC run itself.

internal.thinning

Does internal thinning during the MCMC process, storing only every 'internal.thinning' MCMC iterations.

min.its

Minimum number of (remaining) iterations, below which a warning is given. Defaults for now to 1,000 iterations.

write.MCMC

Especially longer runs or sites with many dates can take up lots of memory. For such cases, MCMC iterations are stored in temporary files (and in the session's tempdir() folder) rather than in memory - however this could impact your computer as the files have to be written to repeatedly and they can become huge. Defaults to FALSE.

MCMC.dir

Directory where temporary MCMC files will be saved. If left empty, defaults to tempdir().

remove.tmp

If write.MCMC=TRUE, then by default the (often huge) temporary files are deleted after the run. Set to remove.tmp=FALSE to keep these temporary files. The files will be placed in the temporary folder of this R session - where this is can be checked by providing the command tempdir().

init.ages

By default, the ballpark age estimates to feed the MCMC are calculated automatically, however they can also be provided manually, as two rows of values (for x0 and xp0) which have to obey the assumptions (e.g., chronological ordering).

ballpark.method

Initial, ballpark values for the initial ages (if not provided by the option 'init.ages'). Can be 1 (based on a linear model) or 2 (based on sorted random draws).

y.scale

The scale of the vertical axis of the main plot. This can be the positions of the dated levels ('positions') or their position order ('dates').

showrun

Whether or not to show how the MCMC process is progressing during the run. Defaults to FALSE.

sep

Separator for the fields in the .csv file. Defaults to a comma.

normal

Calculations can be done assuming that the measurements are normally distributed. By default this is set to FALSE and a student-t distribution is used (Christen and Perez 2009)

delta.R

The ages can be modelled to have an offset. The mean is 0 by default.

delta.STD

The error of the offset. Set to 0 by default.

t.a

First parameter for the student-t distribution (defaults to 3; higher numbers make the distribution approximate the normal distribution more).

t.b

Second parameter for the student-t distribution (defaults to 4; higher numbers make the distribution approximate the normal distribution more).

cc

Calibration curve to be used, for glueing to a postbomb curve. Could be 1 (IntCal20; default), 2 (Marine20), 3 (SHCal20) or 4 (custom curve). Normally not used (since this information is best provided within the .csv files), except in the case where there are postbomb dates (requiring the 'gluing' of pre- and postbomb curves).

cc.dir

Directory of calibration curve(s). Keep empty for the default value.

prob

After the run, a fit of the model with the dates is calculated, as the ratio of model iterations that fit the hpd ranges of the dates. Defaults to the prob=0.95 hpd ranges.

postbomb

Negative C-14 ages should be calibrated using a postbomb curve. This could be 1 (northern-hemisphere region 1), 2 (NH region 2), 3 (NH region 3), 4 (southern hemisphere regions 1-2), or 5 (SH region 3). Defaults to no postbomb curve (postbomb=FALSE).

BCAD

The calendar scale of graphs and age output-files is in cal BP by default, but can be changed to BC/AD using BCAD=TRUE. Needs more work probably.

ask

Whether or not to ask if a folder should be made (if required).

talk

Whether or not to provide feedback on folders written into and on what is happening.

show.progress

Whether or not to provide feedback on progress made with the run.

clean.garbage

Whether or not to clean up the memory 'garbage collection' after a run. Recommendable if you have many dates or long runs.

save.info

Whether or not to store a variable ‘info’ in the session which contains the run input, output and settings. Defaults to save.info=TRUE.

age.span

Expected age span. Defaults to run from the current year in AD to 55e3 which is the current cal BP limit for C-14 dates. If older, non-14C dates are present, age.span is set to the larger of the radiocarbon limit or twice the age of the oldest non-radiocarbon age.

...

Options for the plot. See draw.strat.

Details

Dates further down the sequence should have older ages than dates further up, even though owing to scatter, the dates themselves might not be in exact chronological order. The amount of scatter, the laboratory error and an offset can also be modelled. The function reads in a comma-separated-values (.csv) file of a specific format. The first column contains the names of the dates/information, the second column has the age(s) (uncalibrated for radiocarbon dates, as they will be calibrated during the modelling), the third column their errors, the fourth column their position (see below), and the fifth column cc, the calibration curve information. Additional columns for the reservoir effect (delta.R and delta.STD) and the student-t model (t.a and t.b) can be added, much like rbacon .csv files. The file should contain a header as the first row, named "lab ID", "age", "error", "position", and "cc" (with additional fields as per below if required). The extension of the file should be ".csv". The positions of the dates (column 4) should be entered with the topmost, youngest levels first, and then working downward toward the oldest levels. The topmost position gets the lowest number (e.g., 0), and each subsequent entry should have a higher position number to ensure that the levels are ordered in time. Dates in 'blocks' where there is no known age ordering between the dates in the block (but where that block is known to be older than the level above it and younger than the level below it) should all get the same position in column 4. The function does not only deal with dates (radiocarbon or otherwise), but can also model undated levels and a range of gaps between the dated levels. This is done mostly through column 5 in the .csv files, where a 0 is for dates on the cal BP scale, 1 for radiocarbon dates that require calibration with IntCal20, 2 with Marine20, 3 with SHCal20, 4 a custom calibration curve; additional information can be provided by adding entries for undated levels (cc=10), gaps of exactly known length (cc=11), normally distributed gap lengths (cc=12), or gamma distributed gap lengths (cc=13). The age estimates are obtained through a t-walk MCMC run (Christen and Fox 2010). In this process, initial ball-park point estimates for the ages of each dated depth are given, checked for their chronological ordering (and for the sizes of any gaps) and then modified through many iterations. For each iteration, a random dated depth is chosen and its age changed by just a little nudge, a check is performed to ensure that all age estimates remain in chronological order (and that gap sizes remain obeyed), and the 'energy' or likelihood of the age estimates is calculated (iterations where all ages fit well within the calibrated distributions receive a higher energy; see l.calib). Then this iteration with the updated group of age estimates is either accepted or rejected. The acceptance probability depends on the iteration's energy; if its energy is higher than that of the previous iteration it is accepted, but if it is lower, it is accepted with a probability proportional to its relative energy. Therefore, over many iterations the process will 'learn' from the data and find high-energy combinations of parameter values that fit with the prior constraints that the ages should be ordered chronologically. Because the iterations are based on a process of modifying values of one parameter each iteration, and because some iterations will not be accepted, the MCMC output will often have a large degree of dependence between neighbouring iterations. Therefore, some thinning will have to be done, by storing only one every few iterations (default 20). Also, since the initial ball-park estimates could be quite wrong, the first 100 or so iterations should also be discarded (burnin). It is thus important to check the time-series of the energy after the run. We don't want to see a remaining burn-in at the start, and we don't want to see a noticeable 'structure' where iterations remain in approximately or entirely the same spot for a long time. Instead, an ideal run will look like white noise. By default, the model output and the settings are stored in a list called ‘info’ which is placed into R's session as a list. For example, to retrieve the model output, type info$output. This has a column for each date, and a row for each stored MCMC iteration. The MCMC's energy can be found in info$Us. The model's ‘structure’ such as blocks or gaps can be found in info$struc. To check which dates were used, type info$dets or info$dat (the latter will include all information, including any gaps).

Value

a variable 'info' which contains the dating and modelling information to produce a plot (see details). Also calls the function draw.strat to produce a plot of the results.

References

Bronk Ramsey C, 1995. Radiocarbon calibration and analysis of stratigraphy: The OxCal program. Radiocarbon 37, 425 – 430.

Buck CE, Kenworthy JB, Litton CD, Smith AFM, 1991. Combining archaeological and radiocarbon information: a Bayesian approach to calibration. Antiquity 65, 808-821.

Buck et al. 1999. BCal: an on-line Bayesian radiocarbon calibration tool. Internet Archaeology 7.

Christen JA, Fox C 2010. A general purpose sampling algorithm for continuous distributions (the t-walk). Bayesian Analysis 5, 263-282.

Nicholls G, Jones M 2001. Radiocarbon dating with temporal order constraints. Journal of the Royal Statistical Society: Series C (Applied Statistics) 50, 503-521.

Examples

## Not run: 
tmp <- tempdir()
strat(, strat.dir=tmp, its=1000, thinning=1, internal.thinning=1)

## End(Not run)

[Package coffee version 0.3.0 Index]