impute.univariate.bayesian.mi {miWQS} | R Documentation |
Univariate Bayesian Imputation
Description
Given interval-censored data between 0 and different detection limits (DL), impute.univariate.bayesian.mi
generates K complete datasets using Univariate Bayesian Imputation.
Usage
impute.univariate.bayesian.mi(
X,
DL,
T = 1000L,
n.burn = 1L,
K = 5L,
verbose = FALSE
)
Arguments
X |
A numeric vector, matrix, or data-frame of chemical concentration levels with n subjects and C chemicals to be imputed. Missing values are indicated by NA's. Ideally, a numeric matrix. |
DL |
The detection limit for each chemical as a numeric vector with length equal to C chemicals. Vector must be complete (no NA's); any chemical that has a missing detection limit is not imputed. If DL is a data-frame or matrix with 1 row or 1 column, it is forced as a numeric vector. |
T |
Number of total iterations for the Gibbs Sampler. Default: 1000L. |
n.burn |
The burn-in, which is the number of initial iterations to be discarded. Generally, the burn-in can be quite large as the imputed chemical matrices, X.imputed, are formed from the end of the chain – the lowest state used is |
K |
A natural number of imputed datasets to generate. Default: 5L. |
verbose |
Logical; if TRUE, prints more information. Useful to check for any errors in the code. Default: FALSE. |
Details
In univariate Bayesian Imputation, only one chemical is imputed at a time. Both the observed and missing data are assumed to follow
Subjects and chemicals are assumed to be independent. Jeffery's priors are placed on mean and variance for each chemical. Posterior simulation uses data augmentation approach. Convergence is checked using Gelman-Rubin statistics. Given sample convergence, the K sets of posterior missing values come from the burned Markov chains thinned by K. The imputed values then replaces the missing data, which forms K complete datasets.
Each of the posterior parameters from MCMC chain, mu.post, sigma.post, and log.x.miss, is saved as a list of mcmc objects (in coda) of length # of chemicals. (A list was chosen since the number of missing values n0 might be different among chemicals).
Value
Returns a list that contains:
- X.imputed
** An array of n subjects x C chemicals x K imputed datasets on the normal scale.
- mu.post
A list with length equal to the number of chemicals, where each element (or for each chemical) is the posterior MCMC chain of the mean, saved as a T x 1 coda::
mcmc
object.- sigma.post
A list with length equal to the number of chemicals, where each element of list (or for each chemical) is the posterior MCMC chain of the standard deviation, sigma, saved as T x 1 coda::mcmc object.
- log.x.miss
A list with length equal to the number of chemicals, where each element of list is a T x
matrix of the log of the imputed missing values, saved as coda::mcmc object.
is the total # of missing values for the jth chemical.
- convgd.table
A data-frame summarizing convergence with C rows and columns of the Gelman-Rubin statistic and whether the point estimate is less than 1.1. A summary is also printed to the screen.
- number.no.converged
A check and summary of convgd.table. Total number of parameters that fail to indicate convergence of MCMC chains using Gelman-Rubin statistic. Should be 0.
- indicator.miss
A check. The sum of imputed missing values above detection limit that is printed to the screen. Should be 0.
** Most important and used.
Note
No seed is set in this function. Because bootstraps and MCMC are random, a seed should be set before every use.
References
Hargarten, P. M., & Wheeler, D. C. (2020). Accounting for the Uncertainty Due to Chemicals Below the Detection Limit in Mixture Analysis. Environmental Research, 186, 109466. https://doi.org/10.1016/j.envres.2020.109466
Examples
# Example 1: 10% BDLs Example -------------------------
# Sample Dataset 87, using 10% BDL Scenario
data(simdata87)
set.seed(472195)
result.imputed <- impute.univariate.bayesian.mi(
X = simdata87$X.bdl[, 1:6], DL = simdata87$DL[1:6],
T = 1000, n.burn = 50, K = 2, verbose = TRUE)
# Did the MCMC converge? A summary of Gelman Statistics is provided.
summary(result.imputed$convg.table)
# Summary of Impouted Values
apply(result.imputed$X.imputed, 2:3, summary)
# To show examples for the accessory functions, save the dataset.
# save( result.imputed, l.data, file = "./data/result_imputed.RData")