R: Estimate maximum likelihood accuracy statistics by...

estimate_ML {emery}

R Documentation

Estimate maximum likelihood accuracy statistics by expectation maximization

Description

estimate_ML() is a general function for estimating the maximum likelihood accuracy statistics for a set of methods with no known reference value, i.e. "truth", or "gold standard".

Usage

estimate_ML(
  type = c("binary", "ordinal", "continuous"),
  data,
  init = list(NULL),
  max_iter = 1000,
  tol = 1e-07,
  save_progress = TRUE,
  ...
)

estimate_ML_binary(
  data,
  init = list(prev_1 = NULL, se_1 = NULL, sp_1 = NULL),
  max_iter = 100,
  tol = 1e-07,
  save_progress = TRUE
)

estimate_ML_continuous(
  data,
  init = list(prev_1 = NULL, mu_i1_1 = NULL, sigma_i1_1 = NULL, mu_i0_1 = NULL,
    sigma_i0_1 = NULL),
  max_iter = 100,
  tol = 1e-07,
  save_progress = TRUE
)

estimate_ML_ordinal(
  data,
  init = list(pi_1_1 = NULL, phi_1ij_1 = NULL, phi_0ij_1 = NULL, n_level = NULL),
  level_names = NULL,
  max_iter = 1000,
  tol = 1e-07,
  save_progress = TRUE
)

Arguments

`type`	A string specifying the data type of the methods under evaluation.
`data`	An `n_obs` by `n_method` `matrix` containing the observed values for each method. If the dimensions are named, row names will be used to name each observation (`obs_names`) and column names will be used to name each measurement method (`method_names`).
`init`	An optional list of initial values used to seed the EM algorithm. If initial values are not provided, the `pollinate_ML()` function will be called on the data to estimate starting values. It is recommended to try several sets of starting parameters to ensure that the algorithm converges to the same results. This is to verify that the result does not represent a local extrema.
`max_iter`	The maximum number of EM algorithm iterations to compute before reporting a result.
`tol`	The minimum change in statistic estimates needed to continue iterating the EM algorithm.
`save_progress`	A logical indication of whether to save interim calculations used in the EM algorithm.
`...`	Additional arguments
`level_names`	An optional, ordered, character vector of unique names corresponding to the levels of the methods.

Details

The lack of an infallible reference method is referred to as an imperfect gold standard (GS). Accuracy statistics which rely on a GS method, such as sensitivity, specificity, and AUC, can be estimated using imperfect gold standards by iteratively estimating the maximum likelihood values of these statistics while the conditional independence assumption holds. estimate_ML() relies on a collection of expectation maximization (EM) algorithms to achieve this. The EM algorithms used in this function are based on those presented in Statistical Methods in Diagnostic Medicine, Second Edition (Zhou et al. 2011) and have been validated on several examples therein. Additional details about these algorithms can be found for binary (Walter and Irwig 1988), ordinal (Zhou et al. 2005), and continuous (Hsieh et al. 2009) methods. Minor changes to the literal calculations have been made for efficiency, code readability, and the like, but the underlying steps remain functionally unchanged.

Value

estimate_ML() returns an S4 object of class "MultiMethodMLEstimate" containing the maximum likelihood accuracy statistics calculated by EM.

References

Zhou X, Obuchowski NA, McClish DK (2011). Statistical Methods in Diagnostic Medicine. Wiley. doi:10.1002/9780470906514.

Walter SD, Irwig LM (1988). “Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review.” J. Clin. Epidemiol., 41(9), 923–937. doi:10.1016/0895-4356(88)90110-2.

Zhou X, Castelluccio P, Zhou C (2005). “Nonparametric estimation of ROC curves in the absence of a gold standard.” Biometrics, 61(2), 600–609. doi:10.1111/j.1541-0420.2005.00324.x.

Hsieh H, Su H, Zhou X (2009). “Interval Estimation for the Difference in Paired Areas under the ROC Curves in the Absense of a Gold Standard Test.” Statistics in Medicine. https://doi.org/10.1002/sim.3661.

Examples

# Set seed for this example
set.seed(11001101)

# Generate data for 4 binary methods
my_sim <- generate_multimethod_data(
  "binary",
  n_obs = 75,
  n_method = 4,
  se = c(0.87, 0.92, 0.79, 0.95),
  sp = c(0.85, 0.93, 0.94, 0.80),
  method_names = c("alpha", "beta", "gamma", "delta"))

# View the data
my_sim$generated_data

# View the parameters used to generate the data
my_sim$params

# Estimate ML accuracy values by EM algorithm
my_result <- estimate_ML(
  "binary",
  data = my_sim$generated_data,
  save_progress = FALSE # this reduces the data stored in the resulting object
)

# View results of ML estimate
my_result@results

[Package emery version 0.5.1 Index]