estimate_ML {emery} | R Documentation |
Estimate maximum likelihood accuracy statistics by expectation maximization
Description
estimate_ML()
is a general function for estimating the maximum likelihood accuracy
statistics for a set of methods with no known reference value, i.e. "truth", or
"gold standard".
Usage
estimate_ML(
type = c("binary", "ordinal", "continuous"),
data,
init = list(NULL),
max_iter = 1000,
tol = 1e-07,
save_progress = TRUE,
...
)
estimate_ML_binary(
data,
init = list(prev_1 = NULL, se_1 = NULL, sp_1 = NULL),
max_iter = 100,
tol = 1e-07,
save_progress = TRUE
)
estimate_ML_continuous(
data,
init = list(prev_1 = NULL, mu_i1_1 = NULL, sigma_i1_1 = NULL, mu_i0_1 = NULL,
sigma_i0_1 = NULL),
max_iter = 100,
tol = 1e-07,
save_progress = TRUE
)
estimate_ML_ordinal(
data,
init = list(pi_1_1 = NULL, phi_1ij_1 = NULL, phi_0ij_1 = NULL, n_level = NULL),
level_names = NULL,
max_iter = 1000,
tol = 1e-07,
save_progress = TRUE
)
Arguments
type |
A string specifying the data type of the methods under evaluation. |
data |
An |
init |
An optional list of initial values used to seed the EM algorithm.
If initial values are not provided, the |
max_iter |
The maximum number of EM algorithm iterations to compute before reporting a result. |
tol |
The minimum change in statistic estimates needed to continue iterating the EM algorithm. |
save_progress |
A logical indication of whether to save interim calculations used in the EM algorithm. |
... |
Additional arguments |
level_names |
An optional, ordered, character vector of unique names corresponding to the levels of the methods. |
Details
The lack of an infallible reference method is referred to
as an imperfect gold standard (GS). Accuracy statistics which rely on a GS
method, such as sensitivity, specificity, and AUC,
can be estimated using imperfect gold standards by iteratively estimating the
maximum likelihood values of these statistics while the conditional independence
assumption holds. estimate_ML()
relies on a collection of expectation maximization (EM) algorithms
to achieve this. The EM algorithms used in this function are based on those presented in
Statistical Methods in Diagnostic Medicine, Second Edition
(Zhou et al. 2011) and have been validated on
several examples therein. Additional details about these algorithms can be found
for binary (Walter and Irwig 1988), ordinal (Zhou et al. 2005),
and continuous (Hsieh et al. 2009) methods.
Minor changes to the literal calculations have been
made for efficiency, code readability, and the like, but the underlying steps
remain functionally unchanged.
Value
estimate_ML()
returns an S4 object of class "MultiMethodMLEstimate"
containing the maximum likelihood accuracy statistics calculated by EM.
References
Zhou X, Obuchowski NA, McClish DK (2011). Statistical Methods in Diagnostic Medicine. Wiley. doi:10.1002/9780470906514.
Walter SD, Irwig LM (1988). “Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review.” J. Clin. Epidemiol., 41(9), 923–937. doi:10.1016/0895-4356(88)90110-2.
Zhou X, Castelluccio P, Zhou C (2005). “Nonparametric estimation of ROC curves in the absence of a gold standard.” Biometrics, 61(2), 600–609. doi:10.1111/j.1541-0420.2005.00324.x.
Hsieh H, Su H, Zhou X (2009). “Interval Estimation for the Difference in Paired Areas under the ROC Curves in the Absense of a Gold Standard Test.” Statistics in Medicine. https://doi.org/10.1002/sim.3661.
Examples
# Set seed for this example
set.seed(11001101)
# Generate data for 4 binary methods
my_sim <- generate_multimethod_data(
"binary",
n_obs = 75,
n_method = 4,
se = c(0.87, 0.92, 0.79, 0.95),
sp = c(0.85, 0.93, 0.94, 0.80),
method_names = c("alpha", "beta", "gamma", "delta"))
# View the data
my_sim$generated_data
# View the parameters used to generate the data
my_sim$params
# Estimate ML accuracy values by EM algorithm
my_result <- estimate_ML(
"binary",
data = my_sim$generated_data,
save_progress = FALSE # this reduces the data stored in the resulting object
)
# View results of ML estimate
my_result@results