R: MCMC Estimation of the Two-Stage Binary Outcome...

COMBO_MCMC_2stage {COMBO}

R Documentation

MCMC Estimation of the Two-Stage Binary Outcome Misclassification Model

Description

Jointly estimate \beta, \gamma^{(1)}, and \gamma^{(2)} parameters from the true outcome first-stage observation, and second-stage observation mechanisms, respectively, in a two-stage binary outcome misclassification model.

Usage

COMBO_MCMC_2stage(
  Ystar1,
  Ystar2,
  x_matrix,
  z1_matrix,
  z2_matrix,
  prior,
  beta_prior_parameters,
  gamma1_prior_parameters,
  gamma2_prior_parameters,
  naive_gamma2_prior_parameters,
  number_MCMC_chains = 4,
  MCMC_sample = 2000,
  burn_in = 1000,
  display_progress = TRUE
)

Arguments

`Ystar1`	A numeric vector of indicator variables (1, 2) for the observed outcome `Y^{*(1)}`. The reference category is 2.
`Ystar2`	A numeric vector of indicator variables (1, 2) for the second-stage observed outcome `Y^{*(2)}`. There should be no `NA` terms. The reference category is 2.
`x_matrix`	A numeric matrix of covariates in the true outcome mechanism. `x_matrix` should not contain an intercept.
`z1_matrix`	A numeric matrix of covariates in the observation mechanism. `z1_matrix` should not contain an intercept.
`z2_matrix`	A numeric matrix of covariates in the second-stage observation mechanism. `z2_matrix` should not contain an intercept and no values should be `NA`.
`prior`	A character string specifying the prior distribution for the `\beta`, `\gamma^{(1)}`, and `\gamma^{(2)}` parameters. Options are `"t"`, `"uniform"`, `"normal"`, or `"dexp"` (double Exponential, or Weibull).
`beta_prior_parameters`	A numeric list of prior distribution parameters for the `\beta` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the first element of the list should contain a matrix of location, lower bound, mean, or shape parameters, respectively, for `\beta` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the second element of the list should contain a matrix of shape, upper bound, standard deviation, or scale parameters, respectively, for `\beta` terms. For prior distribution `"t"`, the third element of the list should contain a matrix of the degrees of freedom for `\beta` terms. The third list element should be empty for all other prior distributions. All matrices in the list should have dimensions `n_cat` X `dim_x`, and all elements in the `n_cat` row should be set to `NA`.
`gamma1_prior_parameters`	A numeric list of prior distribution parameters for the `\gamma^{(1)}` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for `\gamma^{(1)}` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for `\gamma^{(1)}` terms. For prior distribution `"t"`, the third element of the list should contain an array of the degrees of freedom for `\gamma^{(1)}` terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensions `n_cat` X `n_cat` X `dim_z1`, and all elements in the `n_cat` row should be set to `NA`.
`gamma2_prior_parameters`	A numeric list of prior distribution parameters for the `\gamma^{(2)}` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for `\gamma^{(2)}` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for `\gamma^{(2)}` terms. For prior distribution `"t"`, the third element of the list should contain an array of the degrees of freedom for `\gamma^{(2)}` terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensions `n_cat` X `n_cat` X `n_cat` X `dim_z2`, and all elements in the `n_cat` row should be set to `NA`.
`naive_gamma2_prior_parameters`	A numeric list of prior distribution parameters for the naive model `\gamma^{(2)}` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the first element of the list should contain an array of location, lower bound, mean, or shape parameters, respectively, for naive `\gamma^{(2)}` terms. For prior distributions `"t"`, `"uniform"`, `"normal"`, or `"dexp"`, the second element of the list should contain an array of shape, upper bound, standard deviation, or scale parameters, respectively, for naive `\gamma^{(2)}` terms. For prior distribution `"t"`, the third element of the list should contain an array of the degrees of freedom for naive `\gamma^{(2)}` terms. The third list element should be empty for all other prior distributions. All arrays in the list should have dimensions `n_cat` X `n_cat` X `dim_z2`, and all elements in the `n_cat` row should be set to `NA`. Note that prior distributions for the naive `\beta` terms are inherted from the `beta_prior_parameters` argument.
`number_MCMC_chains`	An integer specifying the number of MCMC chains to compute. The default is `4`.
`MCMC_sample`	An integer specifying the number of MCMC samples to draw. The default is `2000`.
`burn_in`	An integer specifying the number of MCMC samples to discard for the burn-in period. The default is `1000`.
`display_progress`	A logical value specifying whether messages should be displayed during model compilation. The default is `TRUE`.

Value

COMBO_MCMC_2stage returns a list of the posterior samples and posterior means for both the binary outcome misclassification model and a naive logistic regression of the observed outcome, Y*, predicted by the matrix x. The list contains the following components:

`posterior_sample_df`	A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to `number_MCMC_chains`. The second column specifies the parameter associated with a given row. `\beta` terms have dimensions `dim_x` X `n_cat`. The `\gamma^{(1)}` terms have dimensions `n_cat` X `n_cat` X `dim_z1`, where the first index specifies the first-stage observed outcome category and the second index specifies the true outcome category. The `\gamma^{(2)}` terms have dimensions `n_cat` X `n_cat` X `n_cat` X `dim_z2`, where the first index specifies the second-stage observed outcome category, the second index specifies the first-stage observed outcome category, and the third index specifies the true outcome category. The final column provides the MCMC sample.
`posterior_means_df`	A data frame containing three columns. The first column specifies the parameter associated with a given row. Parameters are indexed as in the `posterior_sample_df`. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.
`naive_posterior_sample_df`	A data frame containing three columns. The first column indicates the chain from which a sample is taken, from 1 to `number_MCMC_chains`. The second column specifies the parameter associated with a given row. Naive `\beta` terms have dimensions `dim_x` X `n_cat`. The final column provides the MCMC sample.
`naive_posterior_means_df`	A data frame containing three columns. The first column specifies the naive parameter associated with a given row. Parameters are indexed as in the `naive_posterior_sample_df`. The second column provides the posterior mean computed across all chains and all samples. The final column provides the posterior median computed across all chains and all samples.

Examples



# Helper functions
sum_every_n <- function(x, n){
vector_groups = split(x,
                      ceiling(seq_along(x) / n))
sum_x = Reduce(`+`, vector_groups)

return(sum_x)
}

sum_every_n1 <- function(x, n){
vector_groups = split(x,
                      ceiling(seq_along(x) / n))
sum_x = Reduce(`+`, vector_groups) + 1

return(sum_x)
}

# Example

set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1

true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))

x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z1_matrix = matrix(rgamma(n, z1_shape), ncol = 1)
Z1 = matrix(c(rep(1, n), z1_matrix[,1]), ncol = 2, byrow = FALSE)
z2_matrix = matrix(rgamma(n, z2_shape), ncol = 1)
Z2 = matrix(c(rep(1, n), z2_matrix[,1]), ncol = 2, byrow = FALSE)

exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)

true_Y <- rep(NA, n)
for(i in 1:n){
    true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}

exp_z1g1 = exp(Z1 %*% true_gamma1)
pistar1_denominator = matrix(c(1 + exp_z1g1[,1], 1 + exp_z1g1[,2]),
                             ncol = 2, byrow = FALSE)
pistar1_result = exp_z1g1 / pistar1_denominator

pistar1_matrix = matrix(c(pistar1_result[,1], 1 - pistar1_result[,1],
                          pistar1_result[,2], 1 - pistar1_result[,2]),
                        ncol = 2, byrow = FALSE)


obs_Y1 <- rep(NA, n)
for(i in 1:n){
    true_j = true_Y[i]
    obs_Y1[i] = which(rmultinom(1, 1,
                      pistar1_matrix[c(i, n + i),
                                     true_j]) == 1)
 }

Ystar1 <- obs_Y1

exp_z2g2_1 = exp(Z2 %*% true_gamma2[,,1])
exp_z2g2_2 = exp(Z2 %*% true_gamma2[,,2])

pi_denominator1 = apply(exp_z2g2_1, FUN = sum_every_n1, n, MARGIN = 2)
pi_result1 = exp_z2g2_1 / rbind(pi_denominator1)

pi_denominator2 = apply(exp_z2g2_2, FUN = sum_every_n1, n, MARGIN = 2)
pi_result2 = exp_z2g2_2 / rbind(pi_denominator2)

pistar2_matrix1 = rbind(pi_result1,
                        1 - apply(pi_result1,
                                  FUN = sum_every_n, n = n,
                                  MARGIN = 2))

pistar2_matrix2 = rbind(pi_result2,
                        1 - apply(pi_result2,
                                  FUN = sum_every_n, n = n,
                                  MARGIN = 2))

pistar2_array = array(c(pistar2_matrix1, pistar2_matrix2),
                   dim = c(dim(pistar2_matrix1), 2))

obs_Y2 <- rep(NA, n)
for(i in 1:n){
    true_j = true_Y[i]
    obs_k = Ystar1[i]
    obs_Y2[i] = which(rmultinom(1, 1,
                                  pistar2_array[c(i,n+ i),
                                                obs_k, true_j]) == 1)
}

Ystar2 <- obs_Y2

unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE)
unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE)

unif_lower_gamma1 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
                          dim = c(2,2,2))
unif_upper_gamma1 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
                          dim = c(2,2,2))

unif_upper_gamma2 <- array(rep(c(5, NA), 8), dim = c(2,2,2,2))
unif_lower_gamma2 <- array(rep(c(-5, NA), 8), dim = c(2,2,2,2))

unif_lower_naive_gamma2 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
                                dim = c(2,2,2))
unif_upper_naive_gamma2 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
                                dim = c(2,2,2))

beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta)
gamma1_prior_parameters <- list(lower = unif_lower_gamma1, upper = unif_upper_gamma1)
gamma2_prior_parameters <- list(lower = unif_lower_gamma2, upper = unif_upper_gamma2)
naive_gamma2_prior_parameters <- list(lower = unif_lower_naive_gamma2,
                                      upper = unif_upper_naive_gamma2)

MCMC_results <- COMBO_MCMC_2stage(Ystar1, Ystar2,
                                  x_matrix = x_matrix, z1_matrix = z1_matrix,
                                  z2_matrix = z2_matrix,
                                  prior = "uniform",
                                  beta_prior_parameters = beta_prior_parameters,
                                  gamma1_prior_parameters = gamma1_prior_parameters,
                                  gamma2_prior_parameters = gamma2_prior_parameters,
                                  naive_gamma2_prior_parameters = naive_gamma2_prior_parameters,
                                  number_MCMC_chains = 2,
                                  MCMC_sample = 200, burn_in = 100)
MCMC_results$posterior_means_df

[Package COMBO version 1.1.0 Index]