COMBO_MCMC {COMBO}R Documentation

MCMC Estimation of the Binary Outcome Misclassification Model

Description

Jointly estimate \beta and \gamma parameters from the true outcome and observation mechanisms, respectively, in a binary outcome misclassification model.

Usage

COMBO_MCMC(
  Ystar,
  x,
  z,
  prior,
  beta_prior_parameters,
  gamma_prior_parameters,
  number_MCMC_chains = 4,
  MCMC_sample = 2000,
  burn_in = 1000,
  display_progress = TRUE
)

Arguments

Ystar

A numeric vector of indicator variables (1, 2) for the observed outcome Y*. The reference category is 2.

x

A numeric matrix of covariates in the true outcome mechanism. x should not contain an intercept.

z

A numeric matrix of covariates in the observation mechanism. z should not contain an intercept.

prior

A character string specifying the prior distribution for the \beta and \gamma parameters. Options are "t", "uniform", "normal", or "dexp" (double Exponential, or Weibull).

beta_prior_parameters

A numeric list of prior distribution parameters for the \beta terms. For prior distributions "t", "uniform", "normal", or "dexp", the first element of the list should contain a matrix of location, lower bound, mean, or shape parameters, respectively, for \beta terms. For prior distributions "t", "uniform", "normal", or "dexp", the second element of the list should contain a matrix of shape, upper bound, standard deviation, or scale parameters, respectively, for \beta terms. For prior distribution "t", the third element of the list should contain a matrix of the degrees of freedom for \beta terms. The third list element should be empty for all other prior distributions. All matrices in the list should have dimensions n_cat X dim_x, and all elements in the n_cat row should be set to NA.

gamma_prior_parameters

A numeric list of prior distribution parameters for the \gamma terms. For prior distributions "t", "uniform", "normal", or "dexp", the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for \gamma terms. For prior distributions "t", "uniform", "normal", or "dexp", the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for \gamma terms. For prior distribution "t", the third element of the list should contain an array of the degrees of freedom for \gamma terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensions n_cat X n_cat X dim_z, and all elements in the n_cat row should be set to NA.

number_MCMC_chains

An integer specifying the number of MCMC chains to compute. The default is 4.

MCMC_sample

An integer specifying the number of MCMC samples to draw. The default is 2000.

burn_in

An integer specifying the number of MCMC samples to discard for the burn-in period. The default is 1000.

display_progress

A logical value specifying whether messages should be displayed during model compilation. The default is TRUE.

Value

COMBO_MCMC returns a list of the posterior samples and posterior means for both the binary outcome misclassification model and a naive logistic regression of the observed outcome, Y*, predicted by the matrix x. The list contains the following components:

posterior_sample_df

A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to number_MCMC_chains. The second column specifies the parameter associated with a given row. \beta terms have dimensions dim_x X n_cat. The \gamma terms have dimensions n_cat X n_cat X dim_z, where the first index specifies the observed outcome category and the second index specifies the true outcome category. The final column provides the MCMC sample.

posterior_means_df

A data frame containing three columns. The first column specifies the parameter associated with a given row. Parameters are indexed as in the posterior_sample_df. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.

naive_posterior_sample_df

A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to number_MCMC_chains. The second column specifies the parameter associated with a given row. Naive \beta terms have dimensions dim_x X n_cat. The final column provides the MCMC sample.

naive_posterior_means_df

A data frame containing three columns. The first column specifies the naive parameter associated with a given row. Parameters are indexed as in the naive_posterior_sample_df. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.

Examples


set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z_shape <- 1

true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)

x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z_matrix = matrix(rgamma(n, z_shape), ncol = 1)
Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE)

exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)

true_Y <- rep(NA, n)
for(i in 1:n){
    true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}

exp_zg = exp(Z %*% true_gamma)
pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE)
pistar_result = exp_zg / pistar_denominator

pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1],
                         pistar_result[,2], 1 - pistar_result[,2]),
                       ncol = 2, byrow = FALSE)

obs_Y <- rep(NA, n)
for(i in 1:n){
    true_j = true_Y[i]
    obs_Y[i] = which(rmultinom(1, 1,
                     pistar_matrix[c(i, n + i),
                                     true_j]) == 1)
 }

Ystar <- obs_Y

unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE)
unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE)

unif_lower_gamma <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
                          dim = c(2,2,2))
unif_upper_gamma <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
                          dim = c(2,2,2))

beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta)
gamma_prior_parameters <- list(lower = unif_lower_gamma, upper = unif_upper_gamma)

MCMC_results <- COMBO_MCMC(Ystar, x = x_matrix, z = z_matrix,
                           prior = "uniform",
                           beta_prior_parameters = beta_prior_parameters,
                           gamma_prior_parameters = gamma_prior_parameters,
                           number_MCMC_chains = 2,
                           MCMC_sample = 200, burn_in = 100)
MCMC_results$posterior_means_df

[Package COMBO version 1.0.0 Index]