impute.univariate.bayesian.mi {miWQS} | R Documentation |
Univariate Bayesian Imputation
Description
Given interval-censored data between 0 and different detection limits (DL), impute.univariate.bayesian.mi
generates K complete datasets using Univariate Bayesian Imputation.
Usage
impute.univariate.bayesian.mi(
X,
DL,
T = 1000L,
n.burn = 1L,
K = 5L,
verbose = FALSE
)
Arguments
X |
A numeric vector, matrix, or data-frame of chemical concentration levels with n subjects and C chemicals to be imputed. Missing values are indicated by NA's. Ideally, a numeric matrix. |
DL |
The detection limit for each chemical as a numeric vector with length equal to C chemicals. Vector must be complete (no NA's); any chemical that has a missing detection limit is not imputed. If DL is a data-frame or matrix with 1 row or 1 column, it is forced as a numeric vector. |
T |
Number of total iterations for the Gibbs Sampler. Default: 1000L. |
n.burn |
The burn-in, which is the number of initial iterations to be discarded. Generally, the burn-in can be quite large as the imputed chemical matrices, X.imputed, are formed from the end of the chain – the lowest state used is |
K |
A natural number of imputed datasets to generate. Default: 5L. |
verbose |
Logical; if TRUE, prints more information. Useful to check for any errors in the code. Default: FALSE. |
Details
In univariate Bayesian Imputation, only one chemical is imputed at a time. Both the observed and missing data are assumed to follow
log( X_{ij} ) \sim^{indep} Norm(\mu_j , \sigma^2_j) , i=1,...n ; j=1,...C
Subjects and chemicals are assumed to be independent. Jeffery's priors are placed on mean and variance for each chemical. Posterior simulation uses data augmentation approach. Convergence is checked using Gelman-Rubin statistics. Given sample convergence, the K sets of posterior missing values come from the burned Markov chains thinned by K. The imputed values then replaces the missing data, which forms K complete datasets.
Each of the posterior parameters from MCMC chain, mu.post, sigma.post, and log.x.miss, is saved as a list of mcmc objects (in coda) of length # of chemicals. (A list was chosen since the number of missing values n0 might be different among chemicals).
Value
Returns a list that contains:
- X.imputed
** An array of n subjects x C chemicals x K imputed datasets on the normal scale.
- mu.post
A list with length equal to the number of chemicals, where each element (or for each chemical) is the posterior MCMC chain of the mean, saved as a T x 1 coda::
mcmc
object.- sigma.post
A list with length equal to the number of chemicals, where each element of list (or for each chemical) is the posterior MCMC chain of the standard deviation, sigma, saved as T x 1 coda::mcmc object.
- log.x.miss
A list with length equal to the number of chemicals, where each element of list is a T x
n_{0j}
matrix of the log of the imputed missing values, saved as coda::mcmc object.n_{0j}
is the total # of missing values for the jth chemical.- convgd.table
A data-frame summarizing convergence with C rows and columns of the Gelman-Rubin statistic and whether the point estimate is less than 1.1. A summary is also printed to the screen.
- number.no.converged
A check and summary of convgd.table. Total number of parameters that fail to indicate convergence of MCMC chains using Gelman-Rubin statistic. Should be 0.
- indicator.miss
A check. The sum of imputed missing values above detection limit that is printed to the screen. Should be 0.
** Most important and used.
Note
No seed is set in this function. Because bootstraps and MCMC are random, a seed should be set before every use.
References
Hargarten, P. M., & Wheeler, D. C. (2020). Accounting for the Uncertainty Due to Chemicals Below the Detection Limit in Mixture Analysis. Environmental Research, 186, 109466. https://doi.org/10.1016/j.envres.2020.109466
Examples
# Example 1: 10% BDLs Example -------------------------
# Sample Dataset 87, using 10% BDL Scenario
data(simdata87)
set.seed(472195)
result.imputed <- impute.univariate.bayesian.mi(
X = simdata87$X.bdl[, 1:6], DL = simdata87$DL[1:6],
T = 1000, n.burn = 50, K = 2, verbose = TRUE)
# Did the MCMC converge? A summary of Gelman Statistics is provided.
summary(result.imputed$convg.table)
# Summary of Impouted Values
apply(result.imputed$X.imputed, 2:3, summary)
# To show examples for the accessory functions, save the dataset.
# save( result.imputed, l.data, file = "./data/result_imputed.RData")