run.tfr.mcmc {bayesTFR} | R Documentation |

## Running Markov Chain Monte Carlo for Parameters of Total Fertility Rate in Phase II

### Description

Runs (or continues running) MCMCs for simulating the total fertility rate of all countries of the world (phase II), using a Bayesian hierarchical model.

### Usage

```
run.tfr.mcmc(nr.chains = 3, iter = 62000,
output.dir = file.path(getwd(), "bayesTFR.output"),
thin = 1, replace.output = FALSE, annual = FALSE, uncertainty = FALSE,
start.year = 1950, present.year = 2020, wpp.year = 2019,
my.tfr.file = NULL, my.locations.file = NULL, my.tfr.raw.file = NULL,
use.wpp.data = TRUE, ar.phase2 = FALSE, buffer.size = 100,
raw.outliers = c(-2, 1),
U.c.low = 5.5, U.up = 8.8, U.width = 3,
mean.eps.tau0 = -0.25, sd.eps.tau0 = 0.4, nu.tau0 = 2,
Triangle_c4.low = 1, Triangle_c4.up = 2.5,
Triangle_c4.trans.width = 2,
Triangle4.0 = 0.3, delta4.0 = 0.8, nu4 = 2,
S.low = 3.5, S.up = 6.5, S.width = 0.5,
a.low = 0, a.up = 0.2, a.width = 0.02,
b.low = a.low, b.up = a.up, b.width = 0.05,
sigma0.low = if (annual) 0.0045 else 0.01, sigma0.up = 0.6,
sigma0.width = 0.1, sigma0.min = 0.04,
const.low = 0.8, const.up = 2, const.width = 0.3,
d.low = 0.05, d.up = 0.5, d.trans.width = 1,
chi0 = -1.5, psi0 = 0.6, nu.psi0 = 2,
alpha0.p = c(-1, 0.5, 1.5), delta0 = 1, nu.delta0 = 2,
dl.p1 = 9, dl.p2 = 9, phase3.parameter=NULL,
S.ini = NULL, a.ini = NULL, b.ini = NULL, sigma0.ini = NULL,
Triangle_c4.ini = NULL, const.ini = NULL, gamma.ini = 1,
phase3.starting.values = NULL, proposal_cov_gammas = NULL,
iso.unbiased = NULL, covariates = c("source", "method"), cont_covariates = NULL,
source.col.name="source", seed = NULL, parallel = FALSE, nr.nodes = nr.chains,
save.all.parameters = FALSE, compression.type = 'None',
auto.conf = list(max.loops = 5, iter = 62000, iter.incr = 10000,
nr.chains = 3, thin = 80, burnin = 2000),
verbose = FALSE, verbose.iter = 10, ...)
continue.tfr.mcmc(iter, chain.ids = NULL,
output.dir = file.path(getwd(), "bayesTFR.output"),
parallel = FALSE, nr.nodes = NULL, auto.conf = NULL,
verbose = FALSE, verbose.iter = 10, ...)
```

### Arguments

`nr.chains` |
Number of MCMC chains to run. |

`iter` |
Number of iterations to run in each chain. In addition to a single value, it can have the value ‘auto’ in which case the function runs for the number of iterations given in the |

`output.dir` |
Directory which the simulation output should be written into. |

`thin` |
Thinning interval between consecutive observations to be stored on disk. |

`replace.output` |
If |

`annual` |
If |

`uncertainty` |
Logical. If |

`use.wpp.data` |
Logical indicating if default WPP data should be used, i.e. if |

`ar.phase2` |
Logical where |

`start.year` |
Start year for using historical data. |

`present.year` |
End year for using historical data. |

`wpp.year` |
Year for which WPP data is used. The functions loads a package called wpp |

`my.tfr.file` |
File name containing user-specified TFR time series for one or more countries. See Details below. |

`my.locations.file` |
File name containing user-specified locations. See Details below. |

`my.tfr.raw.file` |
File name of the raw TFR, used when |

`buffer.size` |
Buffer size (in number of iterations) for keeping data in the memory. The smaller the |

`raw.outliers` |
Vector of size two giving the maximum annual decrease and increase of raw TFR change, respectively. The default values mean that any raw TFR data that decrease more than 2 or increase more than 1 in one year are considered as outliers. |

`U.c.low` , `U.up` |
Lower and upper bound of the parameter |

`U.width` |
Width for slice sampling used when updating the |

`mean.eps.tau0` , `sd.eps.tau0` |
Mean and standard deviation of the prior distribution of |

`nu.tau0` |
The shape parameter of the prior Gamma distribution of |

`Triangle_c4.low` , `Triangle_c4.up` |
Lower and upper bound of the |

`Triangle_c4.trans.width` |
Width for slice sampling used when updating the logit-transformed |

`Triangle4.0` , `delta4.0` |
Mean and standard deviation of the prior distribution of the |

`nu4` |
The shape parameter of the prior Gamma distribution of |

`S.low` , `S.up` |
Lower and upper bound of the uniform prior distribution of the |

`S.width` |
Width for slice sampling used when updating the |

`a.low` , `a.up` |
Lower and upper bound of the uniform prior distribution of the |

`a.width` |
Width for slice sampling used when updating the |

`b.low` , `b.up` |
Lower and upper bound of the uniform prior distribution of the |

`b.width` |
Width for slice sampling used when updating the |

`sigma0.low` , `sigma0.up` |
Lower and upper bound of the uniform prior distribution of the |

`sigma0.width` |
Width for slice sampling used when updating the |

`sigma0.min` |
Minimum value that |

`const.low` , `const.up` |
Lower and upper bound of the uniform prior distribution of the |

`const.width` |
Width for slice sampling used when updating the |

`d.low` , `d.up` |
Lower and upper bound of the parameter |

`d.trans.width` |
Width for slice sampling used when updating the logit-transformed |

`chi0` , `psi0` |
Prior mean and standard deviation of the |

`nu.psi0` |
The shape parameter of the prior Gamma distribution of |

`alpha0.p` |
Vector of prior means of the |

`delta0` |
Prior standard deviation of the |

`nu.delta0` |
The shape parameter of the prior Gamma distribution of |

`dl.p1` , `dl.p2` |
Values of the parameters |

`phase3.parameter` |
When |

`S.ini` |
Initial value for the |

`a.ini` |
Initial value for the |

`b.ini` |
Initial value for the |

`sigma0.ini` |
Initial value for the |

`Triangle_c4.ini` |
Initial value for the |

`const.ini` |
Initial value for the |

`gamma.ini` |
Initial value for the |

`phase3.starting.values` |
This parameter is used to provide a list of Phase 3 initial values, such as |

`proposal_cov_gammas` |
Proposal for the gamma covariance matrices for each country. It should be a list with two values: |

`iso.unbiased` |
Codes of countries for which the vital registration TFR estimates are considered unbiased. Only used if |

`covariates` , `cont_covariates` |
Categorical and continuous features used in estimating bias and standard deviation if |

`source.col.name` |
If |

`seed` |
Seed of the random number generator. If |

`parallel` |
Logical determining if the simulation should run multiple chains in parallel. If it is |

`nr.nodes` |
Relevant only if |

`save.all.parameters` |
If |

`compression.type` |
One of ‘None’, ‘gz’, ‘xz’, ‘bz’, determining type of a compression of the MCMC files. Important: Do not use this option for a long MCMC simulation as this tends to cause very long run times due to slow reading! |

`auto.conf` |
List containing a configuration for an ‘automatic’ run (see description of argument |

`verbose` |
Logical switching log messages on and off. |

`verbose.iter` |
Integer determining how often (in number of iterations) log messages are outputted during the estimation. |

`...` |
Additional parameters to be passed to the function |

`chain.ids` |
Array of chain identifiers that should be resumed. If it is |

### Details

The function `run.tfr.mcmc`

creates an object of class `bayesTFR.mcmc.meta`

and stores it in `output.dir`

. It launches `nr.chains`

MCMCs, either sequentially or in parallel. Parameter traces of each chain are stored as (possibly compressed) ASCII files in a subdirectory of `output.dir`

, called `mc`

*x* where *x* is the identifier of that chain. There is one file per parameter, named after the parameter with the suffix “.txt”, possibly followed by a compression suffix if `compression.type`

is given. Country-specific parameters (`U, d, \gamma`

) have the suffix `_c`

*y* where *y* is the country code. In addition to the trace files, each `mc`

*x* directory contains the object `bayesTFR.mcmc`

in binary format. All chain-specific files are written into disk after the first, last and each `buffer.size`

-th iteration.

Using the function `continue.tfr.mcmc`

one can continue simulating an existing MCMCs by `iter`

iterations for either all or selected chains.

The function loads observed data (further denoted as WPP dataset) from the `tfr`

and `tfr_supplemental`

datasets in a wpp`x`

package where `x`

is the `wpp.year`

. It is then merged with the `include`

dataset that corresponds to the same `wpp.year`

. The argument `my.tfr.file`

can be used to overwrite those default data. If `use.wpp.data`

is `FALSE`

, it fully replaces the default dataset. Otherwise (by default), such a file can include a subset of countries contained in the WPP dataset, as well as a set of new countries. In the former case,
the function replaces the corresponding country data from the WPP dataset by values in this file. Only columns are replaced that match column names of the WPP dataset, and in addition, columns ‘last.observed’ and ‘include_code’ are used, if present. Countries are merged with WPP using the column ‘country_code’. In addition, in order the countries to be included in the simulation, in both cases (whether they are included in the WPP dataset or not), they must be contained in the table of locations (`UNlocations`

). In addition, their corresponding `include_code`

must be set to 2. If the column ‘include_code’ is present in `my.tfr.file`

, its value overwrites the default include code, unless it is -1.

The default UN table of locations mentioned above can be overwritten/extended by using a file passed as the `my.locations.file`

argument. Such a file must have the same structure as the `UNlocations`

dataset. Entries in this file will overwrite corresponding entries in `UNlocations`

matched by the column ‘country_code’. If there is no such entry in the default dataset, it will be appended. This option of appending new locations is especially useful in cases when `my.tfr.file`

contains new countries/regions that are not included in `UNlocations`

. In such a case, one must provide a `my.locations.file`

with a definition of those countries/regions.

For simulation of the hyperparameters of the Bayesian hierarchical model, all countries are used that are included in the WPP dataset, possibly complemented by the `my.tfr.file`

, that have `include_code`

equal to 2. The hyperparameters are used to simulate country-specific parameters, which is done for all countries with `include_code`

equal 1 or 2. The following values of `include_code`

in `my.tfr.file`

are recognized: -1 (do not overwrite the default include code), 0 (ignore), 1 (include in prediction but not estimation), 2 (include in both, estimation and prediction). Thus, the set of countries included in the estimation and prediction can be fully user-specific.

Optionally, `my.tfr.file`

can contain a column called `last.observed`

containing the year of the last observation for each country. In such a case, the code would ignore any data after that time point. Furthermore, the function `tfr.predict`

fills in the missing values using the median of the BHM procedure (stored in `tfr_matrix_reconstructed`

of the `bayesTFR.prediction`

object). For `last.observed`

values that are below a middle year of a time interval `[t_i, t_{i+1}]`

(computed as `t_i+3`

) the last valid data point is the time interval `[t_{i-1}, t_i]`

, whereas for values larger equal a middle year, the data point in `[t_i, t_{i+1}]`

is valid.

The package contains a dataset called ‘my_tfr_template’ (in ‘extdata’ directory) which is a template for user-specified `my.tfr.file`

.

The parameter `uncertainty`

is set to control whether past TFR is considered to be precise (`FALSE`

), or need to be estimated from the raw data (`TRUE`

). In the latter case, the raw TFR observations are taken either from the `rawTFR`

dataset (default) or from a file given by the `my.tfr.raw.file`

argument. The Bayesian hierarchical model considers the past TFR as unknown, estimates it and stores in `output.dir`

. Details can be found in Liu and Raftery (2020). The `covariates`

, `cont_covariates`

arguments are for listing categorical and continuous features for estimating bias and standard deviation of past TFR observations. If a country is known to have unbiased vital registration (VR) records, one can include it in the `iso.unbiased`

argument as those countries will estimate their past VR records to have 0 bias and 0.0161 standard deviation. The VR records are identified as having “VR” in the column given by `source.col.name`

(“source” by default).

If `annual=TRUE`

, which implies using annual data for training the model, the parameter `ar.phase2`

will be activated. If `ar.phase2`

is set to `TRUE`

, then the model of Phase II will change from `d_{c,t} = g_{c,t} + \epsilon_{c,t}`

to `d_{c,t}-g_{c,t} = \phi(d_{c,t-1}-g_{c,t-1}) + \epsilon_{c,t}`

. `\phi`

is considered as country-independent and is called `rho_phase2`

.

Furthermore, if `annual`

is `TRUE`

and `my.tfr.file`

is given, the data in the file must be on annual basis and no matching with the WPP dataset takes place.

### Value

An object of class `bayesTFR.mcmc.set`

which is a list with two components:

`meta` |
An object of class |

`mcmc.list` |
A list of objects of class |

### Author(s)

Hana Sevcikova, Leontine Alkema, Peiran Liu

### References

Peiran Liu, Hana Sevcikova, Adrian E. Raftery (2023): Probabilistic Estimation and Projection of the Annual Total Fertility Rate Accounting for Past Uncertainty: A Major Update of the bayesTFR R Package. Journal of Statistical Software, 106(8), 1-36. doi:10.18637/jss.v106.i08.

L. Alkema, A. E. Raftery, P. Gerland, S. J. Clark, F. Pelletier, Buettner, T., Heilig, G.K. (2011). Probabilistic Projections of the Total Fertility Rate for All Countries. Demography, Vol. 48, 815-839. doi:10.1007/s13524-011-0040-5.

P. Liu, and A. E. Raftery (2020). Accounting for Uncertainty About Past Values In Probabilistic Projections of the Total Fertility Rate for All Countries. Annals of Applied Statistics, Vol 14, no. 2, 685-705. doi:10.1214/19-AOAS1294.

### See Also

`get.tfr.mcmc`

, `summary.bayesTFR.mcmc.set`

.

### Examples

```
## Not run:
sim.dir <- tempfile()
m <- run.tfr.mcmc(nr.chains = 1, iter = 5, output.dir = sim.dir, verbose = TRUE)
summary(m)
m <- continue.tfr.mcmc(iter = 5, verbose = TRUE)
summary(m)
unlink(sim.dir, recursive = TRUE)
## End(Not run)
```

*bayesTFR*version 7.4-2 Index]