hier.gt.simulation {groupTesting}R Documentation

Simulating Hierarchical Group Testing Data

Description

This function simulates hierarchical group testing data with any number of hierarchical stages.

Usage

hier.gt.simulation(N, p = 0.1, S, psz, Se, Sp, assayID, Yt = NULL)

Arguments

N

The number of individuals to be tested.

p

A vector of length N consisting of individual disease probabilities.

S

The number of stages used in testing, where S >= 1.

psz

A vector of pool sizes in stages 1-S.

Se

A vector of assay sensitivities in stages 1-S.

Sp

A vector of assay specificities in stages 1-S.

assayID

A vector of the identification numbers of the assays used in stages 1-S.

Yt

A vector of individual true disease statuses.

Details

We consider the S-stage hierarchical testing protocol outlined in Kim et al. (2007). Under this protocol, N individual specimens are first assigned to m non-overlapping pools, where each initial pool size is c; i.e., N=mc. The initial pools are tested in stage 1. If a pooled test is negative, all members in the pool are diagnosed as negative. However, if a pooled test is positive, the pool members are split into non-overlapping subpools to be tested in the next stage. This procedure is continued. Note that individual testing is used in the final stage, S, for case identification.

S is a positive integer, S >= 1. When S=1, only the non-overlapping initial pools are tested in stage 1.

If N is not divisible by the initial pool size c, we implement the following policy to test the remainder individuals: (1) when S=1, simply test the remainder pool once as a pooled sample; (2) when S>1, test the remainder pool based on 2-stage hierarchical testing.

p is a vector of individual disease probabilities. When all individuals have the same probability of disease, say, 0.10, p can be specified as p=rep(0.10, N) or p=0.10.

psz is a vector of length S, where the first element is the stage-1 pool size, the second element is the stage-2 pool size, and so on. Pool size at any stage must be divisible by the pool size used at the next stage. For example, psz can be specified as c(12,3,1) but not as c(12,5,1).

When psz is a vector of length 1, test responses are simulated only from the initial pools.

Se is a vector of length S, where the first element is the sensitivity of the assay used in stage 1, the second element is sensitivity of the assay in stage 2, and so on.

Sp is a vector of length S, where the first element is the specificity of the assay used in stage 1, the second element is specificity of the assay in stage 2, and so on.

assayID is a vector of length S, where the first element is the ID of the assay in stage 1, the second element is the ID of the assay in stage 2, and so on.

When available, the individual true disease statuses (1 for positive and 0 for negative) can be used in simulating the group testing data through argument Yt. When an input is entered for Yt, argument p will be ignored.

Value

A list with components:

gtData

The simulated group testing data.

testsExp

The number of tests expended.

References

Kim HY, Hudgens M, Dreyfuss J, Westreich D, and Pilcher C (2007). Comparison of Group Testing Algorithms for Case Identification in the Presence of Testing Error. Biometrics, 63(4), 1152–1163.

See Also

array.gt.simulation for simulation of the array-based group testing data.

Examples


library(groupTesting)

## Example 1: Two-stage hierarchical (Dorfman) testing
N <- 50              # Sample size
psz <- c(5, 1)       # Pool sizes used in stages 1 and 2
S <- 2               # The number of stages
Se <- c(0.95, 0.95)  # Sensitivities in stages 1-2
Sp <- c(0.98, 0.98)  # Specificities in stages 1-2
assayID <- c(1, 1)   # The same assay in both stages

# (a) Homogeneous population
pHom <- 0.10         # Overall prevalence
hier.gt.simulation(N=N,p=pHom,S=S,psz=psz,Se=Se,Sp=Sp,assayID=assayID)

# Alternatively, the individual true statuses can be used as: 
yt <- rbinom( N, size=1, prob=0.1 )
hier.gt.simulation(N=N,S=S,psz=psz,Se=Se,Sp=Sp,assayID=assayID,Yt=yt)

# (b) Heterogeneous population (regression)
param <- c(-3,2,1)
x1 <- rnorm(N, mean=0, sd=.75)
x2 <- rbinom(N, size=1, prob=0.5)
X <- cbind(1, x1, x2)
pReg <- exp(X%*%param)/(1+exp(X%*%param)) # Logit
hier.gt.simulation(N=N,p=pReg,S=S,psz=psz,Se=Se,Sp=Sp,assayID=assayID)

## Example 2: Initial (1-stage) pooled testing data
N <- 50
S <- 1
Se <- 0.95
Sp <- 0.98
assayID <- 1

# (a) Homogeneous population 
pHom <- 0.10   # Overall prevalence

# a(i) Pooled testing
psz <- 5       # pool size    
hier.gt.simulation(N,pHom,S,psz,Se,Sp,assayID)

# a(ii) Inidividual testing
psz <- 1       # pool size    
hier.gt.simulation(N,pHom,S,psz,Se,Sp,assayID)

# (b) Heterogeneous population (regression)
param <- c(-3,2,1)
x1 <- rnorm(N, mean=0, sd=.75)
x2 <- rbinom(N, size=1, prob=0.5)
X <- cbind(1, x1, x2)
pReg <- exp(X%*%param)/(1+exp(X%*%param))  # Logit

# b(i) Pooled testing
psz <- 5
hier.gt.simulation(N,pReg,S,psz,Se,Sp,assayID)

# b(ii) Individual testing
psz <- 1
hier.gt.simulation(N,pReg,S,psz,Se,Sp,assayID)

## Example 3: Data with other configurations
N <- 48
p <- 0.10
Se <- c(.90, .95, .92, .90, .99)
Sp <- c(.96, .96, .90, .92, .95)
Assay <- 1:5

# Initial pooled testing, using the first element of Se, Sp & Assay
pszH1 <- 4
hier.gt.simulation(N=N,p=p,S=1,psz=pszH1,Se=Se,Sp=Sp,assayID=Assay)

pszH2 <- c(4,1)       # Two-stage, using first 2 elements of Se, Sp & Assay
hier.gt.simulation(N=N,p=p,S=2,psz=pszH2,Se=Se,Sp=Sp,assayID=Assay)

pszH4 <- c(16,8,2,1)  # Four-stage, using first 4 elements of Se, Sp & Assay
hier.gt.simulation(N=N,p=p,S=4,psz=pszH4,Se=Se,Sp=Sp,assayID=Assay)

pszH3 <- c(12,2,1)    # Three-stage, using first 3 elements of Se, Sp & Assay
Assay3 <- c(2,1,3)    # Array ID numbers do not need to be in order
hier.gt.simulation(N=N,p=p,S=3,psz=pszH3,Se=Se,Sp=Sp,assayID=Assay3)

# Works with a remainder pool of 2 individuals
N <- 50
psz <- c(12,2,1)
hier.gt.simulation(N=N,p=p,S=3,psz=psz,Se=Se,Sp=Sp,assayID=Assay)


[Package groupTesting version 1.1.0 Index]