binGroup2 {binGroup2} | R Documentation |
binGroup2: Identification and Estimation using Group Testing
Description
Methods for the group testing identification and estimation problems.
Details
Methods for identification of positive items in group testing designs: Operating characteristics (e.g., expected number of tests) are calculated for commonly used hierarchical and array-based algorithms. Optimal testing configurations for an algorithm can be found as well. Please see Hitt et al. (2019) for specific details.
Methods for estimation and inference for proportions in group testing designs: For estimating one proportion or the difference of proportions, confidence interval methods are included that account for different pool sizes. Functions for hypothesis testing of proportions, calculation of power, and calculation of the expected width of confidence intervals are also included. Furthermore, regression methods and simulation of group testing data are implemented for simple pooling (Dorfman testing with or without retests), halving, and array testing designs.
The binGroup2
package is based upon the binGroup
package that
was originally designed for the group testing estimation problem. Over time,
additional functions for estimation and for the group testing identification
problem were included. Due to the diverse styles resulting from these
additions, we have created binGroup2
as a way to unify functions in
a coherent structure and incorporate additional functions for
identification. The binGroup2
package provides all the main
functionality from the binGroup
package, and can be used in place of
the binGroup
package. The name “binGroup” originates from the
assumption in basic estimation for group testing that the
number of positive groups has a binomial distribution. While more advanced
estimation methods no longer make this assumption, we continue with the
binGroup
name for consistency.
Bilder (2019a,b) provide introductions to group testing. These papers and additional details about group testing are available at http://chrisbilder.com/grouptesting/.
This research was supported by the National Institutes of Health under grant R01 AI121351.
Identification
The binGroup2 package focuses on the group testing identification problem using hierarchical and array-based group testing algorithms.
The OTC1
function implements a number of group testing
algorithms, described in Hitt et al. (2019), which calculate the operating
characteristics and find the optimal testing configuration over a range of
possible initial group sizes and/or testing configurations (sets of
subsequent group sizes). The OTC2
function does the same with
a multiplex assay that tests for two diseases.
The operatingCharacteristics1
(opChar1
) and
operatingCharacteristics2
(opChar2
) functions
calculate operating characteristics for a specified testing configuration
with assays that test for one and two diseases, respectively.
These functions allow the sensitivity and specificity to differ across stages of testing. This means that the accuracy of the diagnostic test can differ for stages in a hierarchical testing algorithm or between row/column testing and individual testing in an array testing algorithm.
Estimation
The binGroup2 package also provides functions for estimation and inference for proportions in group testing designs.
The propCI
function calculates the point estimate and
confidence intervals for a single proportion from group testing data.
The propDiffCI
function does the same for the difference of
proportions. A number of confidence interval methods are available for
groups of equal or different sizes.
The gtWidth
function calculates the expected width of
confidence intervals in group testing. The gtTest
function
calculates p-values for hypothesis tests of single proportions. The
gtPower
function calculates power to reject a hypothesis.
The designPower
function iterates either the number of groups
or group size in a one-parameter group testing design until a pre-specified
power level is achieved. The designEst
function finds the
optimal group size corresponding to the minimal mean-squared error of the
point estimator.
The gtReg
function implements regression methods and the
gtSim
function simulates group testing data for simple
pooling, halving, and array testing designs.
Author(s)
Maintainer: Brianna Hitt brianna.hitt@afacademy.af.edu (ORCID)
Authors:
Christopher Bilder (ORCID)
Frank Schaarschmidt (ORCID)
Brad Biggerstaff (ORCID)
Christopher McMahan (ORCID)
Joshua Tebbs (ORCID)
Other contributors:
Boan Zhang [contributor]
Michael Black [contributor]
Peijie Hou [contributor]
Peng Chen [contributor]
Minh Nguyen [contributor]
References
Altman, D., Bland, J. (1994). “Diagnostic tests 1: Sensitivity and specificity.” BMJ, 308, 1552.
Altman, D., Bland, J. (1994). “Diagnostic tests 2: Predictive values.” BMJ, 309, 102.
Biggerstaff, B. (2008). “Confidence intervals for the difference of proportions estimated from pooled samples.” Journal of Agricultural, Biological, and Environmental Statistics, 13, 478–496.
Bilder, C., Tebbs, J., Chen, P. (2010). “Informative retesting.” Journal of the American Statistical Association, 105, 942–955.
Bilder, C., Tebbs, J., McMahan, C. (2019). “Informative group testing for multiplex assays.” Biometrics, 75, 278–288.
Bilder, C. (2019a). “Group Testing for Estimation.” Wiley StatsRef: Statistics Reference Online.
Bilder, C. (2019b). “Group Testing for Identification.” Wiley StatsRef: Statistics Reference Online.
Bilder, C., Iwen, P., Abdalhamid, B., Tebbs, J., McMahan, C. (2020). “Tests in short supply? Try group testing.” Significance, 17, 15.
Black, M., Bilder, C., Tebbs, J. (2012). “Group testing in heterogeneous populations by using halving algorithms.” Journal of the Royal Statistical Society. Series C: Applied Statistics, 61, 277–290.
Black, M., Bilder, C., Tebbs, J. (2015). “Optimal retesting configurations for hierarchical group testing.” Journal of the Royal Statistical Society. Series C: Applied Statistics, 64, 693–710.
Graff, L., Roeloffs, R. (1972). “Group testing in the presence of test error; an extension of the Dorfman procedure.” Technometrics, 14, 113–122.
Hepworth, G. (1996). “Exact confidence intervals for proportions estimated by group testing.” Biometrics, 52, 1134–1146.
Hepworth, G., Biggerstaff, B. (2017). “Bias correction in estimating proportions by pooled testing.” Journal of Agricultural, Biological, and Environmental Statistics, 22, 602–614.
Hitt, B., Bilder, C., Tebbs, J., McMahan, C. (2019). “The objective function controversy for group testing: Much ado about nothing?” Statistics in Medicine, 38, 4912–4923.
Hou, P., Tebbs, J., Wang, D., McMahan, C., Bilder, C. (2021). “Array testing with multiplex assays.” Biostatistics, 21, 417–431.
Malinovsky, Y., Albert, P., Roy, A. (2016). “Reader reaction: A note on the evaluation of group testing algorithms in the presence of misclassification.” Biometrics, 72, 299–302.
McMahan, C., Tebbs, J., Bilder, C. (2012a). “Informative Dorfman Screening.” Biometrics, 68, 287–296.
McMahan, C., Tebbs, J., Bilder, C. (2012b). “Two-Dimensional Informative Array Testing.” Biometrics, 68, 793–804.
Schaarschmidt, F. (2007). “Experimental design for one-sided confidence intervals or hypothesis tests in binomial group testing.” Communications in Biometry and Crop Science, 2, 32–40. ISSN 1896-0782.
Swallow, W. (1985). “Group testing for estimating infection rates and probabilities of disease transmission.” Phytopathology, 75, 882–889.
Tebbs, J., Bilder, C. (2004). “Confidence interval procedures for the probability of disease transmission in multiple-vector-transfer designs.” Journal of Agricultural, Biological, and Environmental Statistics, 9, 75–90.
Vansteelandt, S., Goetghebeur, E., Verstraeten, T. (2000). “Regression models for disease prevalence with diagnostic tests on pools of serum samples.” Biometrics, 56, 1126–1133.
Verstraeten, T., Farah, B., Duchateau, L., Matu, R. (1998). “Pooling sera to reduce the cost of HIV surveillance: a feasibility study in a rural Kenyan district.” Tropical Medicine & International Health, 3, 747–750.
Xie, M. (2001). “Regression analysis of group testing samples.” Statistics in Medicine, 20, 1957–1969.
Examples
# 1) Identification using hierarchical and array-based
# group testing algorithms with an assay that tests
# for one disease.
# 1.1) Find the optimal testing configuration over a
# range of initial group sizes, using informative
# three-stage hierarchical testing, where
# p denotes the overall prevalence of disease (mean
# parameter of a beta distribution);
# Se denotes the sensitivity of the diagnostic test;
# Sp denotes the specificity of the diagnostic test;
# group.sz denotes the range of initial pool sizes
# for consideration;
# obj.fn specifies the objective functions for which
# to find results; and
# alpha is the heterogeneity level.
set.seed(1002)
results1 <- OTC1(algorithm = "ID3", p = 0.025, Se = 0.95,
Sp = 0.95, group.sz = 3:20,
obj.fn = "ET", alpha = 2)
summary(results1)
# 1.2) Find the optimal testing configuration using
# non-informative array testing without master pooling.
# The sensitivity and specificity differ for row/column
# testing and individual testing.
results2 <- OTC1(algorithm = "A2", p = 0.05,
Se = c(0.95, 0.99), Sp = c(0.95, 0.98),
group.sz = 3:15, obj.fn = "ET")
summary(results2)
# 1.3) Calculate the operating characteristics using
# informative two-stage hierarchical (Dorfman) testing,
# implemented via the pool-specific optimal Dorfman
# (PSOD) method described in McMahan et al. (2012a).
# Hierarchical testing configurations are specified by
# a matrix in the hier.config argument. The rows of
# the matrix correspond to the stages of the
# hierarchical testing algorithm, the columns
# correspond to the individuals to be tested, and the
# cell values correspond to the group number of each
# individual at each stage.
config.mat <- matrix(data = c(rep(1, 5), rep(2, 4), 3, 1:10),
nrow = 2, ncol = 10, byrow = TRUE)
set.seed(8791)
results3 <- opChar1(algorithm = "ID2", p = 0.02,
Se = 0.95, Sp = 0.99,
hier.config = config.mat, alpha = 0.5)
summary(results3)
# 1.4) Calculate the operating characteristics using
# non-informative four-stage hierarchical testing.
config.mat <- matrix(data = c(rep(1, 15), rep(c(1, 2, 3), each = 5),
rep(1, 3), rep(2, 2),
rep(3, 3), rep(4, 2),
rep(5, 4), 6, 1:15),
nrow = 4, ncol = 15, byrow = TRUE)
results4 <- opChar1(algorithm = "D4", p = 0.008,
Se = 0.96, Sp = 0.98,
hier.config = config.mat,
a = c(1, 4, 6, 9, 11, 15))
summary(results4)
# 2) Identification using hierarchical and array-based
# group testing algorithms with a multiplex assay that
# tests for two diseases.
# 2.1) Find the optimal testing configuration using
# non-informative two-stage hierarchical testing, given
# p.vec, a vector of overall joint probabilities of disease;
# Se, a vector of sensitivity values for each disease; and
# Sp, a vector of specificity values for each disease.
# Se and Sp can also be specified as a matrix, where one
# value is specified for each disease at each stage of
# testing.
results5 <- OTC2(algorithm = "D2",
p.vec = c(0.90, 0.04, 0.04, 0.02),
Se = c(0.99, 0.99), Sp = c(0.99, 0.99),
group.sz = 3:20)
summary(results5)
# 2.2) Calculate the operating characteristics for
# informative five-stage hierarchical testing, given
# alpha.vec, a vector of shape parameters for the
# Dirichlet distribution;
# Se, a matrix of sensitivity values; and
# Sp, a matrix of specificity values.
Se <- matrix(data = rep(0.95, 10), nrow = 2, ncol = 5, byrow = TRUE)
Sp <- matrix(data = rep(0.99, 10), nrow = 2, ncol = 5, byrow = TRUE)
config.mat <- matrix(data = c(rep(1, 24), rep(1, 18),
rep(2, 6), rep(1, 9),
rep(2, 9), rep(3, 4), 4, 5,
rep(1, 6), rep(2, 3),
rep(3, 5), rep(4, 4),
rep(5, 3), 6, rep(NA, 2),
1:21, rep(NA, 3)),
nrow = 5, ncol = 24, byrow = TRUE)
results6 <- opChar2(algorithm = "ID5",
alpha = c(18.25, 0.75, 0.75, 0.25),
Se = Se, Sp = Sp,
hier.config = config.mat)
summary(results6)
# 3) Estimation of the overall disease prevalence and
# calculation of confidence intervals.
# 3.1) Suppose 3 groups out of 24 test positively.
# Each group has a size of 7.
propCI(x = 3, m = 7, n = 24, ci.method = "CP")
propCI(x = 3, m = 7, n = 24, ci.method = "Blaker")
propCI(x = 3, m = 7, n = 24, ci.method = "score")
propCI(x = 3, m = 7, n = 24, ci.method = "soc")
# 3.2) Consider the following situation:
# 0 out of 5 groups test positively with groups
# of size 1 (individual testing),
# 0 out of 5 groups test positively with groups of size 5,
# 1 out of 5 groups test positively with groups of size 10,
# 2 out of 5 groups test positively with groups of size 50
propCI(x = c(0, 0, 1, 2), m = c(1, 5, 10, 50),
n = c(5, 5, 5, 5), pt.method = "Gart",
ci.method = "skew-score")
# 4) Estimate a group testing regression model.
# 4.1) Fit a group testing regression model with
# simple pooling using the "hivsurv" dataset.
data(hivsurv)
fit1 <- gtReg(type = "sp",
formula = groupres ~ AGE + EDUC.,
data = hivsurv, groupn = gnum,
sens = 0.9, spec = 0.9, method = "Xie")
summary(fit1)
# 4.2) Simulate data for the halving protocol, and
# fit a group testing regression model.
set.seed(46)
gt.data <- gtSim(type = "halving", par = c(-6, 0.1),
gshape = 17, gscale = 1.4,
size1 = 1000, size2 = 5,
sens = 0.95, spec = 0.95)
fit2 <- gtReg(type = "halving", formula = gres ~ x,
data = gt.data, groupn = groupn,
subg = subgroup, retest = retest,
sens = 0.95, spec = 0.95,
start = c(-6, 0.1), trace = TRUE)
summary(fit2)