R: Univariate Bayesian Imputation

impute.univariate.bayesian.mi {miWQS}

R Documentation

Univariate Bayesian Imputation

Description

Given interval-censored data between 0 and different detection limits (DL), impute.univariate.bayesian.mi generates K complete datasets using Univariate Bayesian Imputation.

Usage

impute.univariate.bayesian.mi(
  X,
  DL,
  T = 1000L,
  n.burn = 1L,
  K = 5L,
  verbose = FALSE
)

Arguments

`X`	A numeric vector, matrix, or data-frame of chemical concentration levels with n subjects and C chemicals to be imputed. Missing values are indicated by NA's. Ideally, a numeric matrix.
`DL`	The detection limit for each chemical as a numeric vector with length equal to C chemicals. Vector must be complete (no NA's); any chemical that has a missing detection limit is not imputed. If DL is a data-frame or matrix with 1 row or 1 column, it is forced as a numeric vector.
`T`	Number of total iterations for the Gibbs Sampler. Default: 1000L.
`n.burn`	The burn-in, which is the number of initial iterations to be discarded. Generally, the burn-in can be quite large as the imputed chemical matrices, X.imputed, are formed from the end of the chain – the lowest state used is `T - 10*K`. Default: 1L (no burn-in).
`K`	A natural number of imputed datasets to generate. Default: 5L.
`verbose`	Logical; if TRUE, prints more information. Useful to check for any errors in the code. Default: FALSE.

Details

In univariate Bayesian Imputation, only one chemical is imputed at a time. Both the observed and missing data are assumed to follow

log( X_{ij} ) \sim^{indep} Norm(\mu_j , \sigma^2_j) , i=1,...n ; j=1,...C

Subjects and chemicals are assumed to be independent. Jeffery's priors are placed on mean and variance for each chemical. Posterior simulation uses data augmentation approach. Convergence is checked using Gelman-Rubin statistics. Given sample convergence, the K sets of posterior missing values come from the burned Markov chains thinned by K. The imputed values then replaces the missing data, which forms K complete datasets.

Each of the posterior parameters from MCMC chain, mu.post, sigma.post, and log.x.miss, is saved as a list of mcmc objects (in coda) of length # of chemicals. (A list was chosen since the number of missing values n0 might be different among chemicals).

Value

Returns a list that contains:

X.imputed: ** An array of n subjects x C chemicals x K imputed datasets on the normal scale.
mu.post: A list with length equal to the number of chemicals, where each element (or for each chemical) is the posterior MCMC chain of the mean, saved as a T x 1 coda::mcmc object.
sigma.post: A list with length equal to the number of chemicals, where each element of list (or for each chemical) is the posterior MCMC chain of the standard deviation, sigma, saved as T x 1 coda::mcmc object.
log.x.miss: A list with length equal to the number of chemicals, where each element of list is a T x n_{0j} matrix of the log of the imputed missing values, saved as coda::mcmc object. n_{0j} is the total # of missing values for the jth chemical.
convgd.table: A data-frame summarizing convergence with C rows and columns of the Gelman-Rubin statistic and whether the point estimate is less than 1.1. A summary is also printed to the screen.
number.no.converged: A check and summary of convgd.table. Total number of parameters that fail to indicate convergence of MCMC chains using Gelman-Rubin statistic. Should be 0.
indicator.miss: A check. The sum of imputed missing values above detection limit that is printed to the screen. Should be 0.

** Most important and used.

Note

No seed is set in this function. Because bootstraps and MCMC are random, a seed should be set before every use.

References

Hargarten, P. M., & Wheeler, D. C. (2020). Accounting for the Uncertainty Due to Chemicals Below the Detection Limit in Mixture Analysis. Environmental Research, 186, 109466. https://doi.org/10.1016/j.envres.2020.109466

Examples

# Example 1: 10% BDLs Example -------------------------
# Sample Dataset 87, using 10% BDL Scenario
data(simdata87)
set.seed(472195)
result.imputed <- impute.univariate.bayesian.mi(
  X = simdata87$X.bdl[, 1:6], DL = simdata87$DL[1:6],
  T = 1000, n.burn = 50,  K = 2, verbose = TRUE)
# Did the MCMC converge? A summary of Gelman Statistics is provided.
summary(result.imputed$convg.table)
# Summary of Impouted Values
apply(result.imputed$X.imputed, 2:3, summary)
# To show examples for the accessory functions, save the dataset.
# save( result.imputed, l.data, file = "./data/result_imputed.RData")

[Package miWQS version 0.4.4 Index]