R: Simulating Array-Based Group Testing Data

array.gt.simulation {groupTesting}

R Documentation

Simulating Array-Based Group Testing Data

Description

This function simulates two-dimensional array-based group testing data.

Usage

array.gt.simulation(
  N,
  p = 0.1,
  protocol = c("A2", "A2M"),
  n,
  Se,
  Sp,
  assayID,
  Yt = NULL
)

Arguments

`N`	The number of individuals to be tested.
`p`	A vector of length N consisting of individual disease probabilities.
`protocol`	Either "A2" or "A2M", where "A2" ("A2M") refers to the two-dimensional array without (with) testing the members of an array as a single pooled sample.
`n`	The row (or column) size of the arrays.
`Se`	A vector of assay sensitivities.
`Sp`	A vector of assay specificities.
`assayID`	A vector of assay identification numbers.
`Yt`	A vector of individual true disease statuses.

Details

We consider the array testing protocol outlined in Kim et al. (2007). Under this protocol, N individuals are assigned to m non-overlapping n-by-n matrices such that N=mn^2. From each matrix, n pools are formed using the row specimens and another n pools are formed using the column specimens. In stage 1, the 2n pools are tested. In stage 2, individual testing is used for case identification according to the strategy described in Kim et al. (2007). This is a 2-stage protocol called Square Array without Master Pool Testing and denoted by A2(n:1) in Kim et al. (2007). A variant (3-stage protocol) is also presented in Kim et al. (2007) which employs testing the n^2 array members together as an initial pooled unit before implementing the 2-stage array. If the initial pooled test is negative, the procedure stops (i.e., the 2-stage array is not needed). However, if the pooled test is positive, the 2-stage protocol is used as before. This 3-stage approach is called Square Array with Master Pool Testing and is denoted by A2(n^2:n:1). See Kim et al. (2007) for more details.

N should be divisible by the array size n^2. When not divisible, the remainder individuals are tested one by one (i.e., individual testing).

p is a vector of individual disease probabilities. When all individuals have the same probability of disease, say, 0.10, p can be specified as rep(0.10, N) or p=0.10.

For "A2" and "A2M", the pool sizes used are c(n, 1) and c(n^2, n, 1), respectively.

For "A2", Se is c(Se1, Se2), where Se1 is the sensitivity of the assay used for both row and column pools, and Se2 is the sensitivity of the assay used for individual testing. For "A2M", Se is c(Se1, Se2, Se3), where Se1 is for the initial array pool, Se2 is for the row and column pools, and Se3 is for individual testing. Sp is specified in the same manner.

For "A2", assayID is c(1, 1) when the same assay is used for row/column pool testing as well as for individual testing, and assayID is c(1, 2) when assay 1 is used for row/column pool testing and assay 2 is used for individual testing. In the same manner, assayID is specified for "A2M" as c(1, 1, 1), c(1, 2, 3), and in many other ways.

When available, the individual true disease statuses (1 for positive and 0 for negative) can be used in simulating the group testing data through argument Yt. When an input is entered for Yt, argument p will be ignored.

Value

A list with components:

`gtData`	The simulated group testing data.
`testsExp`	The number of tests expended in the simulation.

References

Kim HY, Hudgens M, Dreyfuss J, Westreich D, and Pilcher C (2007). Comparison of Group Testing Algorithms for Case Identification in the Presence of Testing Error. Biometrics, 63(4), 1152–1163.

Examples


library(groupTesting)

## Example 1: Square Array without Master Pool Testing (i.e., 2-Stage Array)
N <- 48              # Sample size
protocol <- "A2"     # 2-stage array
n <- 4               # Row/column size
Se <- c(0.95, 0.95)  # Sensitivities in stages 1-2
Sp <- c(0.98, 0.98)  # Specificities in stages 1-2
assayID <- c(1, 1)   # The same assay in both stages

# (a) Homogeneous population
pHom <- 0.10         # Overall prevalence
array.gt.simulation(N=N,p=pHom,protocol=protocol,n=n,Se=Se,Sp=Sp,assayID=assayID)

# Alternatively, the individual true statuses can be used as: 
yt <- rbinom( N, size=1, prob=0.1 )
array.gt.simulation(N=N,protocol=protocol,n=n,Se=Se,Sp=Sp,assayID=assayID,Yt=yt)

# (b) Heterogeneous population (regression)
param <- c(-3,2,1)
x1 <- rnorm(N, mean=0, sd=.75)
x2 <- rbinom(N, size=1, prob=0.5)
X <- cbind(1, x1, x2)
pReg <- exp(X%*%param)/(1+exp(X%*%param)) # Logit
array.gt.simulation(N=N,p=pReg,protocol=protocol,n=n,Se=Se,Sp=Sp,assayID=assayID)

# The above examples with different assays
Se <- c(0.95, 0.98)
Sp <- c(0.97, 0.99)
assayID <- c(1, 2)
array.gt.simulation(N,pHom,protocol,n,Se,Sp,assayID)
array.gt.simulation(N,pReg,protocol,n,Se,Sp,assayID)

## Example 2: Square Array with Master Pool Testing (i.e., 3-Stage Array)
N <- 48
protocol <- "A2M"
n <- 4
Se <- c(0.95, 0.95, 0.95)
Sp <- c(0.98, 0.98, 0.98)
assayID <- c(1, 1, 1)    # The same assay in 3 stages

# (a) Homogeneous population
pHom <- 0.10
array.gt.simulation(N,pHom,protocol,n,Se,Sp,assayID)

# (b) Heterogeneous population (regression)
param <- c(-3,2,1)
x1 <- rnorm(N, mean=0, sd=.75)
x2 <- rbinom(N, size=1, prob=0.5)
X <- cbind(1, x1, x2)
pReg <- exp(X%*%param)/(1+exp(X%*%param)) # Logit
array.gt.simulation(N,pReg,protocol,n,Se,Sp,assayID)

# The above examples with different assays:
Se <- c(0.95, 0.98, 0.98)
Sp <- c(0.97, 0.98, 0.92)
assayID <- 1:3
array.gt.simulation(N,pHom,protocol,n,Se,Sp,assayID)
array.gt.simulation(N,pReg,protocol,n,Se,Sp,assayID)

[Package groupTesting version 1.1.0 Index]